@adlumal on Hugging Face: "I benchmarked embedding APIs for speed, compared local vs hosted models, and…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

adlumal

posted an update 9 days ago

Post

2404

I benchmarked embedding APIs for speed, compared local vs hosted models, and tuned USearch for sub-millisecond retrieval on 143k chunks using only CPU. The post walks through the results, trade-offs, and what I learned about embedding API terms of service.
The main motivation for using USearch is that CPU compute is cheap and easy to scale.

Blog post: https://huggingface.co/blog/adlumal/lightning-fast-vector-search-for-legal-documents

mpieck

7 days ago

You missed the most important disadvantage of proprietary closed embedding models served in SaaS form - vendor lock-in. You stop paying for the service and you end up with almost useless vector database - you can't produce new vectors for query in RAG (so your RAG stops working), you can't switch the model and leave all the vectors to work with another free model - their latent spaces are incompatible. Only thing you can do with it is to compare already owned vectors with each other, pitty. Proprietary embedding models usage should be considered only with great care.

adlumal

2 days ago

Thanks for the thoughtful comment! For now, I'm of the opinion that SaaS embedding API's are cheap enough that even a large dataset can be re-vectorised. For example, for the 143k chunks the costs were anywhere between around $6 - $30 (from memory). That's every High Court judgement up to 2023 in Australia. Personally I think of the vectors themselves as essentially disposable, since there's better models coming out every month or so. I know not everyone is of a similar mindset, and for ultimate control you'd definitely want to go local.

In this post