---
language:
- en
license: mit
library_name: peft
tags:
- reranking
- information-retrieval
- pointwise
- lora
- peft
- ranknet
base_model: meta-llama/Llama-3.1-8B
datasets:
- Tevatron/msmarco-passage
- abdoelsayed/DeAR-COT
pipeline_tag: text-classification
---

# DeAR-8B-Reranker-RankNet-LoRA-v1

## Model Description

**DeAR-8B-Reranker-RankNet-LoRA-v1** is a LoRA (Low-Rank Adaptation) adapter for neural reranking. This lightweight adapter can be applied to LLaMA-3.1-8B to create a reranker with minimal storage overhead. It achieves comparable performance to the full fine-tuned model while requiring only ~100MB of storage.

## Model Details

- **Model Type:** LoRA Adapter for Pointwise Reranking
- **Base Model:** meta-llama/Llama-3.1-8B
- **Adapter Size:** ~100MB (vs 16GB for full model)
- **Training Method:** LoRA with RankNet Loss + Knowledge Distillation
- **LoRA Rank:** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj

## Key Features

✅ **Lightweight:** Only 100MB vs 16GB full model  
✅ **Efficient Training:** Trains 3x faster than full fine-tuning  
✅ **Easy Deployment:** Just load adapter on top of base model  
✅ **Comparable Performance:** ~98% of full model performance  
✅ **Memory Efficient:** Lower GPU memory during training  

## Usage

### Option 1: Load with PEFT (Recommended)

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig

# Load LoRA adapter
adapter_path = "abdoelsayed/dear-8b-reranker-ranknet-lora-v1"

# Get base model from adapter config
config = PeftConfig.from_pretrained(adapter_path)
base_model_name = config.base_model_name_or_path

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_name,
    num_labels=1,
    torch_dtype=torch.bfloat16
)

# Load and merge LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_path)
model = model.merge_and_unload()  # Merge adapter into base model

model.eval().cuda()

# Use the model
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence..."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=228,
    padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
    
print(f"Relevance score: {score}")
```

### Option 2: Use Helper Function

```python
import torch
from typing import List, Tuple
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig

def load_lora_ranker(adapter_path: str, device: str = "cuda"):
    """Load LoRA adapter and merge with base model."""
    # Get base model path from adapter config
    peft_config = PeftConfig.from_pretrained(adapter_path)
    base_model_name = peft_config.base_model_name_or_path
    
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(base_model_name)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        tokenizer.pad_token_id = tokenizer.eos_token_id
    tokenizer.padding_side = "right"
    
    # Load base model
    base_model = AutoModelForSequenceClassification.from_pretrained(
        base_model_name,
        num_labels=1,
        torch_dtype=torch.bfloat16
    )
    
    # Load LoRA adapter and merge
    model = PeftModel.from_pretrained(base_model, adapter_path)
    model = model.merge_and_unload()
    
    model.eval().to(device)
    return tokenizer, model

# Load model
tokenizer, model = load_lora_ranker("abdoelsayed/dear-8b-reranker-ranknet-lora-v1")

# Rerank documents
@torch.inference_mode()
def rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 64):
    """Rerank documents for a query."""
    device = next(model.parameters()).device
    scores = []
    
    for i in range(0, len(docs), batch_size):
        batch = docs[i:i + batch_size]
        queries = [f"query: {query}"] * len(batch)
        documents = [f"document: {title} {text}" for title, text in batch]
        
        inputs = tokenizer(
            queries,
            documents,
            return_tensors="pt",
            truncation=True,
            max_length=228,
            padding=True
        )
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        logits = model(**inputs).logits.squeeze(-1)
        scores.extend(logits.cpu().tolist())
    
    return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)

# Example
query = "When did Thomas Edison invent the light bulb?"
docs = [
    ("", "Thomas Edison invented the light bulb in 1879"),
    ("", "Coffee is good for diet"),
    ("", "Lightning strike at Seoul"),
]

ranking = rerank(tokenizer, model, query, docs)
print(ranking)  # [(0, 5.2), (2, -3.1), (1, -4.8)]
```

### Using Without Merging (Memory Efficient)

```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSequenceClassification

adapter_path = "abdoelsayed/dear-8b-reranker-ranknet-lora-v1"
config = PeftConfig.from_pretrained(adapter_path)

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    config.base_model_name_or_path,
    num_labels=1,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load adapter (without merging)
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval()

# Use model (adapter layers will be applied automatically)
# ... same inference code as above ...
```

## Performance

| Benchmark | LoRA | Full Model | Difference |
|-----------|------|------------|------------|
| TREC DL19 | 74.2 | 74.5 | -0.3 |
| TREC DL20 | 72.5 | 72.8 | -0.3 |
| BEIR (Avg) | 44.9 | 45.2 | -0.3 |
| MS MARCO | 68.6 | 68.9 | -0.3 |

✅ **98% of full model performance with only 0.6% of the storage!**

## Training Details

### LoRA Configuration
```python
lora_config = {
    "r": 16,  # LoRA rank
    "lora_alpha": 32,  # Scaling factor
    "target_modules": [
        "q_proj", "v_proj", "k_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "SEQ_CLS"
}
```

### Training Hyperparameters
```python
training_args = {
    "learning_rate": 1e-4,  # Higher than full fine-tuning
    "batch_size": 4,  # Larger batch possible due to lower memory
    "gradient_accumulation": 2,
    "epochs": 2,
    "warmup_ratio": 0.1,
    "weight_decay": 0.01,
    "max_length": 228,
    "bf16": True
}
```

### Hardware
- **GPUs:** 4x NVIDIA A100 (40GB)
- **Training Time:** ~12 hours (3x faster than full model)
- **Memory Usage:** ~28GB per GPU (vs ~38GB for full)
- **Trainable Parameters:** 67M (0.8% of total)

## Advantages of LoRA Version

| Aspect | LoRA | Full Model |
|--------|------|------------|
| Storage | 100MB | 16GB |
| Training Time | 12h | 36h |
| Training Memory | 28GB | 38GB |
| Performance | 98% | 100% |
| Loading Time | Fast | Slow |
| Easy Updates | ✅ Yes | ❌ No |

## When to Use LoRA vs Full Model

**Use LoRA when:**
- ✅ Storage is limited
- ✅ Training multiple domain-specific versions
- ✅ Need fast iteration/experimentation
- ✅ 0.3 NDCG@10 difference is acceptable

**Use Full Model when:**
- ❌ Maximum performance required
- ❌ Storage not a concern
- ❌ Single production deployment

## Fine-tuning on Your Data

```python
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    num_labels=1
)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
)

# Apply LoRA
model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 67M || all params: 8B || trainable%: 0.8%

# Train
training_args = TrainingArguments(
    output_dir="./lora-finetuned",
    learning_rate=1e-4,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    bf16=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=your_dataset,
)

trainer.train()

# Save only the LoRA adapter
model.save_pretrained("./lora-adapter")
```

## Model Files

This adapter contains:
- `adapter_config.json` - LoRA configuration
- `adapter_model.safetensors` or `adapter_model.bin` - Adapter weights (~100MB)
- `README.md` - This documentation

## Related Models

**Full Model:**
- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - Full fine-tuned version

**Other LoRA Adapters:**
- [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1) - Binary Cross-Entropy
- [DeAR-8B-Listwise-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-lora-v1) - Listwise ranking

**Resources:**
- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)

## Citation

```bibtex
@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}
```

## License

MIT License

## More Information

- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)