New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feautre] Add vector similarity search support #16603
Comments
Hi, The vector option you are talking about there isn't used for this, I'm afraid. For reference, you can use something like: https://github.com/nmslib/hnswlib Or you can try using The key issue from our perspective is how to index that so we'll not need to scan through all the results. |
Thank you for the transparency. I agree that the index part is the trickiest (sure, it's based on calculating hashes on the vectors, but the devil's in the detail). Then there's a vast field of all possible algorithms for calculating the similarity scores. Predicting what might be the most widely used approach is hard, and I'm no expert, but at first glance, the CosmosDB's and pgvector's approaches would satisfy the majority of cases. Thanks for the link. nmslib seems to be good for approximate search methods (I haven't used it, though). As for SimHash (also MinHash and other hash-based algorithms), my gut feeling is they require significant work in tuning the hash function (introduce weights into the computation, etc.) to handle my feature vectors. Nonetheless, they aren't off the table. |
@ayende , |
If I want to store feature vectors (a numeric array, e.g.
[2.01, 20.85, 14.05]
) in the DB, I'd like to query other records (with arrays of the same dimension) similar to the selected one(s) with a calculated similarity score (e.g. so I could tell that array 1 is similar to array 2 by 80%).It's expected that the calculation of the resultset would be based on a Nearest neighbor search (e.g. knn algorithm with cosine similarity as the most popular implementation, but there're many others).
Current options in Raven
RavenDB already provides a Vector Index mainly used for text similarities. There may be a way to extend it to handle numeric values. It seems nothing is coming out-of-the-box.
Other DB solutions:
Vald (ANN algo only, 1K stars, Go), no .NET Lib
The text was updated successfully, but these errors were encountered: