From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Populate the Milvus database

Populate the Milvus database

Now that we have the input data for the knowledge base ready, let's go ahead and populate it in the Milvus database which serves as the knowledge base. The steps here are pretty straightforward as seen in the earlier videos. We initialize the collection object for the RAG collection. We then insert the data we have set up earlier in the video. We follow this by flushing the inserted data. Then we go ahead and build an index for the RAG embedding field. The index is of type IVF_FLAT and uses L2 metric type for comparing distances. Let's run this code now. For small documents or datasets, this process would only take a small amount of time. For large datasets spanning multiple documents, it needs to be scheduled as a batch process. When doing batch processing, it's recommended to flush only at periodic intervals, as that process takes a lot of resources and results in fragmentation of data. Then indexes can be created before or after the data insert happens. Also, when existing…

Contents