The main motivation for using USearch is that CPU compute is cheap and easy to scale.
Blog post: https://huggingface.co/blog/adlumal/lightning-fast-vector-search-for-legal-documents
Join the community of Machine Learners and AI enthusiasts.
Sign UpYou missed the most important disadvantage of proprietary closed embedding models served in SaaS form - vendor lock-in. You stop paying for the service and you end up with almost useless vector database - you can't produce new vectors for query in RAG (so your RAG stops working), you can't switch the model and leave all the vectors to work with another free model - their latent spaces are incompatible. Only thing you can do with it is to compare already owned vectors with each other, pitty. Proprietary embedding models usage should be considered only with great care.
Thanks for the thoughtful comment! For now, I'm of the opinion that SaaS embedding API's are cheap enough that even a large dataset can be re-vectorised. For example, for the 143k chunks the costs were anywhere between around $6 - $30 (from memory). That's every High Court judgement up to 2023 in Australia. Personally I think of the vectors themselves as essentially disposable, since there's better models coming out every month or so. I know not everyone is of a similar mindset, and for ultimate control you'd definitely want to go local.