DeAR-8B-Reranker-CE-v1

Model Description

DeAR-8B-Reranker-CE-v1 is an 8B parameter neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model uses a classification-based approach to document reranking and is optimized for both accuracy and inference speed.

Model Details

Model Type: Pointwise Reranker (Binary Classification)
Base Model: LLaMA-3.1-8B
Parameters: 8 billion
Training Method: Knowledge Distillation + Binary Cross-Entropy Loss
Teacher Model: LLaMA2-13B-RankLLaMA
Training Data: MS MARCO
Precision: BFloat16

Key Features

✅ Classification-based: Binary relevance prediction with probabilistic outputs
✅ Fast Inference: 2.2s average latency on standard GPU
✅ Strong Baseline: Competitive performance across benchmarks
✅ CoT Enhanced: Trained with Chain-of-Thought reasoning from teacher

Performance

Benchmark	NDCG@10
TREC DL19	73.9
TREC DL20	72.1
BEIR (Avg)	44.8
MS MARCO Dev	68.5

Usage

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_path = "abdoelsayed/dear-8b-reranker-ce-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16
)
model.eval().cuda()

# Score a query-document pair
query = "What is llama?"
document = "The llama is a domesticated South American camelid..."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=228,
    padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
    
print(f"Relevance score: {score}")

Complete Reranking Example

import torch
from typing import List, Tuple
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def load_reranker(model_path: str, device: str = "cuda"):
    """Load the reranker model and tokenizer."""
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_path,
        torch_dtype=torch.bfloat16
    )
    
    # Configure padding token
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        tokenizer.pad_token_id = tokenizer.eos_token_id
    tokenizer.padding_side = "right"
    
    model.eval()
    model.to(device)
    return tokenizer, model

@torch.inference_mode()
def rerank(
    tokenizer,
    model,
    query: str,
    documents: List[Tuple[str, str]],  # (title, text)
    batch_size: int = 64
) -> List[Tuple[int, float]]:
    """
    Rerank documents for a query.
    
    Returns:
        List of (doc_index, score) sorted by relevance (descending)
    """
    device = next(model.parameters()).device
    scores = []
    
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        
        # Prepare batch
        queries = [f"query: {query}"] * len(batch)
        docs = [f"document: {title} {text}" for title, text in batch]
        
        inputs = tokenizer(
            queries,
            docs,
            return_tensors="pt",
            truncation=True,
            max_length=228,
            padding=True,
            return_attention_mask=True
        )
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        # Score batch
        logits = model(**inputs).logits.squeeze(-1)
        scores.extend(logits.cpu().tolist())
    
    # Rank by score
    ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
    return ranked


# Example
tokenizer, model = load_reranker("abdoelsayed/dear-8b-reranker-ce-v1")

query = "When did Thomas Edison invent the light bulb?"
documents = [
    ("", "Lightning strike at Seoul National University"),
    ("", "Thomas Edison tried to invent a device for car but failed"),
    ("", "Coffee is good for diet"),
    ("", "KEPCO fixes light problems"),
    ("", "Thomas Edison invented the light bulb in 1879"),
]

ranking = rerank(tokenizer, model, query, documents)
print(ranking)
# Output: [(4, -2.015625), (1, -5.6875), (2, -6.375), (0, -6.5), (3, -6.78125)]
# Document at index 4 is most relevant

Training Details

Training Data

Primary Dataset: MS MARCO Passage Ranking (~8M pairs)
CoT Dataset: DeAR-COT
Teacher Annotations: Soft labels from 13B teacher model

Training Configuration

{
    "base_model": "meta-llama/Llama-3.1-8B",
    "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
    "loss": "Binary Cross-Entropy",
    "distillation": {
        "temperature": 2.0,
        "alpha": 0.1
    },
    "optimizer": "AdamW",
    "learning_rate": 1e-4,
    "batch_size": 2,
    "gradient_accumulation": 2,
    "epochs": 2,
    "max_length": 228,
    "q_max_len": 32,
    "p_max_len": 196,
    "warmup_ratio": 0.1,
    "weight_decay": 0.01,
    "bf16": true
}

Hardware

GPUs: 4x NVIDIA A100 (40GB)
Training Time: ~34 hours
Framework: DeepSpeed ZeRO Stage 2
Memory Usage: ~38GB per GPU

Loss Function

Binary Cross-Entropy with Knowledge Distillation:

L_total = (1 - α) * BCE(y_pred, y_true) + α * KL(σ(z_s/T), σ(z_t/T))

where:
- BCE: Binary cross-entropy loss
- KL: KL divergence
- z_s: Student logits
- z_t: Teacher logits
- T: Temperature (2.0)
- α: Distillation weight (0.1)
- σ: Sigmoid function

Evaluation Results

TREC Deep Learning

Dataset	NDCG@10	NDCG@20	MRR@10	MAP
DL19	73.90	69.82	87.3	44.92
DL20	72.10	68.45	85.1	42.67

BEIR Benchmark

Dataset	NDCG@10	NDCG@100
MS MARCO	68.5	75.2
NQ	51.8	69.4
HotpotQA	61.2	74.8
FiQA	46.8	62.3
ArguAna	58.9	71.5
SciFact	73.1	82.6
TREC-COVID	84.7	88.3
NFCorpus	39.4	51.7
Average	44.8	68.2

Efficiency Metrics

Metric	Value
Inference Time (batch=64)	2.2s
Throughput	~45 docs/sec
GPU Memory (inference)	18GB
Model Size (BF16)	16GB

Comparison

Model	Loss	DL19	DL20	BEIR Avg	Speed (s)
DeAR-8B-CE	BCE	73.9	72.1	44.8	2.2
DeAR-8B-RankNet	RankNet	74.5	72.8	45.2	2.2
MonoT5-3B	-	71.8	68.9	43.5	3.5
Teacher-13B	-	73.8	71.2	44.8	5.8

Key Observations:

Slightly lower performance than RankNet variant
Identical inference speed
More stable training (simpler loss)
Better for binary relevance tasks

Model Architecture

Input Format: "query: [QUERY] document: [TITLE] [TEXT]"
    ↓
Tokenization (max_length=228)
    ↓
LLaMA-3.1-8B Transformer
    ↓
[CLS] Token Pooling
    ↓
Linear(hidden_size → 1)
    ↓
Sigmoid (optional)
    ↓
Relevance Score

When to Use This Model

Best for:

✅ Binary relevance classification
✅ Large-scale reranking (fast inference)
✅ General-purpose IR tasks
✅ Resource-constrained environments

Consider alternatives for:

❌ Listwise ranking (use DeAR-8B-Listwise)
❌ Maximum performance (use RankNet variant)
❌ Extreme low-latency (use 3B models)

Limitations

Document Truncation: Limited to 196 tokens per document
Query Length: Optimal for queries ≤32 tokens
Language: English only
Domain: Trained on MS MARCO (web documents)
Pointwise: Does not model inter-document dependencies

Bias and Ethical Considerations

Training Data Bias: Inherits biases from MS MARCO dataset
Representation Bias: May perform differently across demographics
Language Bias: Optimized for English; other languages not evaluated
Domain Bias: Best performance on web-style documents

Recommendations:

Evaluate fairness for your specific use case
Test on diverse query sets
Monitor for biased ranking patterns
Consider domain-specific fine-tuning

Fine-tuning

To fine-tune on your own data:

from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments

model = AutoModelForSequenceClassification.from_pretrained(
    "abdoelsayed/dear-8b-reranker-ce-v1",
    num_labels=1
)

training_args = TrainingArguments(
    output_dir="./finetuned-model",
    learning_rate=5e-6,  # Lower LR for fine-tuning
    per_device_train_batch_size=4,
    num_train_epochs=1,
    bf16=True,
    logging_steps=100,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=your_dataset,
)

trainer.train()

Related Models

DeAR Family (8B):

DeAR-8B-RankNet - RankNet loss variant
DeAR-8B-Listwise - Generative listwise reranker
DeAR-8B-CE-LoRA - LoRA adapter version

Other Sizes:

DeAR-3B-CE - Faster 3B variant

Resources:

Citation

@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}

License

MIT License

More Information

GitHub: DataScienceUIBK/DeAR-Reranking
Paper: arXiv:2508.16998
Collection: DeAR Model Collection

Downloads last month: 25

Model tree for abdoelsayed/dear-8b-reranker-ce-v1

Base model

meta-llama/Llama-3.1-8B

Finetuned

(1606)

this model

Datasets used to train abdoelsayed/dear-8b-reranker-ce-v1

Collection including abdoelsayed/dear-8b-reranker-ce-v1

DeAR-Reranking

Collection

DeAR (Deep Agent Rank): Dual-Stage Document Reranking with Reasoning Agents Accepted at EMNLP Findings 2025 • 12 items • Updated 12 days ago • 1