From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Cache management

Cache management

A cache once setup has a long life. Also, it may not be able to achieve optimal behavior right from the start. Let's go through some best practices to maximize the effectiveness of caching with vector databases. First, measure the cache hit ratio for the request. This is the ratio between the number of prompts served from the cache and the total number of prompts. The higher the hit ratio, the more efficient the cache is. Some use cases benefit a lot as the users ask similar questions, while some other use cases may not benefit at all. Next, it's also important to find the right similarity threshold for the distance. If the distance threshold is too small, we will use the LLM more often. If the distance threshold is too high, we will be returning inaccurate results from the cache. It's important to run benchmarks with a dataset of prompts and write responses, and determine the right similarity threshold. At this value, the cache should return accurate responses while maximizing the…
