From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Distance measure considerations

Distance measure considerations

When doing semantic search with vector databases, a key design consideration is the distance measure. When using vector databases, it's critical to understand how the distance measures work for a specific use case. As seen in the earlier code examples, a vector search will always return hits as long as there are records available in the database. If we set a limit of 10 in the query, it will return 10 records as long as there are 10 records in the database. The results are sorted by the distance between the search string and the string in the database. How do we determine if there is actually a match? We need to use distance or similarity thresholds. This is the maximum value of the distance below which we can consider that there is a match. So when a search is executed in Milvus, we can set the radius search parameter to this value so the search only returns those results where the distance is below the radius. What exactly do we mean by similar when comparing two strings? How close…

Contents