DeAR-Reranking
Collection
DeAR (Deep Agent Rank): Dual-Stage Document Reranking with Reasoning Agents Accepted at EMNLP Findings 2025
β’
12 items
β’
Updated
β’
1
DeAR-3B-Reranker-CE-v1 is a 3B parameter efficient neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model provides fast, reliable reranking for production environments where speed and efficiency are critical.
β
Ultra Fast: 1.5s inference (best in DeAR family)
β
Memory Efficient: Runs on single 16GB GPU
β
Production Ready: Stable training with BCE loss
β
Cost Effective: Lower computational costs
β
Binary Classification: Probabilistic relevance scores
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model
model_path = "abdoelsayed/dear-3b-reranker-ce-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(
model_path,
torch_dtype=torch.bfloat16
)
model.eval().cuda()
# Score a query-document pair
query = "What is llama?"
document = "The llama is a domesticated South American camelid..."
inputs = tokenizer(
f"query: {query}",
f"document: {document}",
return_tensors="pt",
truncation=True,
max_length=228,
padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}
with torch.no_grad():
score = model(**inputs).logits.squeeze().item()
print(f"Relevance score: {score}")
import torch
from typing import List, Tuple
@torch.inference_mode()
def fast_rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 128):
"""Fast reranking optimized for 3B model."""
device = next(model.parameters()).device
scores = []
for i in range(0, len(docs), batch_size):
batch = docs[i:i + batch_size]
# Prepare batch
queries = [f"query: {query}"] * len(batch)
documents = [f"document: {t} {p}" for t, p in batch]
# Tokenize
inputs = tokenizer(
queries,
documents,
return_tensors="pt",
truncation=True,
max_length=228,
padding=True
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# Score
logits = model(**inputs).logits.squeeze(-1)
scores.extend(logits.cpu().tolist())
# Rank
return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
# Example
query = "When did Thomas Edison invent the light bulb?"
docs = [
("", "Thomas Edison invented the light bulb in 1879"),
("", "Coffee is good for diet"),
("", "Lightning strike at Seoul National University"),
]
ranking = fast_rerank(tokenizer, model, query, docs, batch_size=128)
print(ranking)
# DeAR-P-3B-BC Output:
# [(0, -6.0625), (2, -11.125), (1, -12.0625)]
# Optimize for maximum throughput
model = AutoModelForSequenceClassification.from_pretrained(
"abdoelsayed/dear-3b-reranker-ce-v1",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model.eval()
# Compile for 20-30% speedup (PyTorch 2.0+)
if hasattr(torch, 'compile'):
model = torch.compile(model, mode="max-autotune")
# Use larger batches for throughput
batch_size = 128 # 3B can handle larger batches
{
"base_model": "meta-llama/Llama-3.2-3B",
"teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
"loss": "Binary Cross-Entropy",
"distillation": {
"temperature": 2.0,
"alpha": 0.1
},
"learning_rate": 1e-4,
"batch_size": 4,
"gradient_accumulation": 2,
"epochs": 2,
"max_length": 228,
"bf16": true
}
| Dataset | NDCG@10 | NDCG@20 | MRR@10 |
|---|---|---|---|
| DL19 | 70.8 | 67.3 | 83.9 |
| DL20 | 68.9 | 65.8 | 81.7 |
| Dataset | NDCG@10 |
|---|---|
| MS MARCO | 65.3 |
| NQ | 48.7 |
| HotpotQA | 57.9 |
| FiQA | 43.6 |
| ArguAna | 55.8 |
| SciFact | 70.2 |
| TREC-COVID | 81.8 |
| NFCorpus | 37.2 |
| Average | 41.7 |
| Metric | 3B-CE | 8B-CE | Improvement |
|---|---|---|---|
| Inference (100 docs) | 1.5s | 2.2s | 1.5x faster |
| Throughput | 67 docs/s | 45 docs/s | 1.5x |
| GPU Memory | 12GB | 18GB | 33% less |
| Model Size | 6GB | 16GB | 62% smaller |
| Model | Loss | DL19 | DL20 | Speed (s) |
|---|---|---|---|---|
| DeAR-3B-CE | BCE | 70.8 | 68.9 | 1.5 |
| DeAR-3B-RankNet | RankNet | 71.2 | 69.4 | 1.5 |
| MonoT5-3B | - | 71.8 | 68.9 | 3.5 |
Key Advantages:
Best for:
Consider alternatives for:
from fastapi import FastAPI
from pydantic import BaseModel
import torch
app = FastAPI()
# Load model once at startup
tokenizer, model = None, None
@app.on_event("startup")
async def load_model():
global tokenizer, model
tokenizer = AutoTokenizer.from_pretrained("abdoelsayed/dear-3b-reranker-ce-v1")
model = AutoModelForSequenceClassification.from_pretrained(
"abdoelsayed/dear-3b-reranker-ce-v1",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model.eval()
if hasattr(torch, 'compile'):
model = torch.compile(model)
class RerankRequest(BaseModel):
query: str
documents: List[str]
@app.post("/rerank")
async def rerank(request: RerankRequest):
ranking = fast_rerank(tokenizer, model, request.query,
[(""doc) for doc in request.documents])
return {"ranking": ranking}
import pandas as pd
from tqdm import tqdm
# Load queries and documents
df = pd.read_csv("queries_docs.csv")
results = []
for _, row in tqdm(df.iterrows()):
ranking = fast_rerank(tokenizer, model, row['query'], row['documents'])
results.append({
'query_id': row['query_id'],
'ranking': ranking
})
# Save results
pd.DataFrame(results).to_csv("reranked.csv")
Input: "query: [Q] [SEP] document: [D]"
β
LLaMA-3.2-3B (24 layers, 3072 hidden)
β
[CLS] Token Pooling
β
Linear(3072 β 1)
β
Binary Relevance Score
DeAR 3B Family:
Larger Models:
Resources:
@article{abdallah2025dear,
title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
journal={arXiv preprint arXiv:2508.16998},
year={2025}
}
MIT License
Base model
meta-llama/Llama-3.2-3B