CRAWLDoc: A Dataset for Robust Ranking of Bibliographic Documents Paper • 2506.03822 • Published Jun 4 • 2
view article Article MedEmbed: Fine-Tuned Embedding Models for Medical / Clinical IR Oct 20, 2024 • 52
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 74
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5, 2024 • 70
DocGraphLM: Documental Graph Language Model for Information Extraction Paper • 2401.02823 • Published Jan 5, 2024 • 36