From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

LLMs and caching

LLMs and caching

Let's now explore how to use a vector database to cache prompts and responses from large language models or LLMs for short. First, let's review some shortcomings of LLMs and how caching can help with these issues. Large language models have taken the world by storm in 2023, and there is a huge interest in using it for business purposes. A lot of innovation is happening, and several business applications are being built that are powered by LLMs. But the problem is the cost of LLMs. It takes a lot of resources to build, deploy, maintain, and use an LLM. So businesses are staying away from building their own models from scratch. On the other hand, when they use cloud LLMs, the cost per inference call is also high. This restricts LLMs to only those use cases where the returns justify the cost. LLMs generate one token at a time due to how the decoder in the transformer architecture functions. This results in high latency, especially when the responses are big. How can caching help? In a…

Contents