Papers
arxiv:2510.19365

The Massive Legal Embedding Benchmark (MLEB)

Published on Oct 22

Abstract

MLEB is the largest open-source benchmark for legal information retrieval, encompassing multiple jurisdictions, document types, and task types.

AI-generated summary

We present the Massive Legal Embedding Benchmark (MLEB), the largest, most diverse, and most comprehensive open-source benchmark for legal information retrieval to date. MLEB consists of ten expert-annotated datasets spanning multiple jurisdictions (the US, UK, EU, Australia, Ireland, and Singapore), document types (cases, legislation, regulatory guidance, contracts, and literature), and task types (search, zero-shot classification, and question answering). Seven of the datasets in MLEB were newly constructed in order to fill domain and jurisdictional gaps in the open-source legal information retrieval landscape. We document our methodology in building MLEB and creating the new constituent datasets, and release our code, results, and data openly to assist with reproducible evaluations.

Community

Hey all,
This is my first-ever paper! In it, we present the Massive Legal Embedding Benchmark (MLEB), a new open-source benchmark for legal information retrieval. It consists of ten expert-annotated datasets spanning multiple jurisdictions (the US, UK, EU, Australia, Ireland, and Singapore), document types (cases, legislation, regulatory guidance, contracts, and literature), and task types (search, zero-shot classification, and question answering).

This paper documents our methodology in creating MLEB as well as our findings. We've openly released our datasets here on Hugging Face and have made our code available on GitHub: https://github.com/isaacus-dev/mleb

Hey @akhaliq , would you mind approving this paper? 🙏🏻

Amazing Work

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.19365 in a model README.md to link it from this page.

Datasets citing this paper 10

Browse 10 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.19365 in a Space README.md to link it from this page.

Collections including this paper 1