---
base_model: meta-llama/Llama-3.2-1B-Instruct
library_name: peft
tags:
- base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
- lora
- transformers
license: apache-2.0
datasets:
- nyu-mll/glue
language:
- en
metrics:
- accuracy
- f1
- matthews_correlation
pipeline_tag: text-classification
---
# MNLI - LLaMA 3.2 1B - QLoRA (10k subset, 4-bit)

## Model Summary
This is a **QLoRA fine-tuned** version of [`meta-llama/Llama-3.2-1B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the **MNLI** (Multi-Genre Natural Language Inference) dataset from [GLUE](https://huggingface.co/datasets/glue/viewer/mnli).

- **Base model:** LLaMA 3.2 1B Instruct  
- **Fine-tuning method:** [QLoRA](https://arxiv.org/abs/2305.14314) with 4-bit quantization  
- **Train subset:** 10k samples (8k train / 1k val / 1k test from train split)  
- **Evaluation:** Official GLUE dev sets (matched / mismatched) + held-out 1k test split  
- **Trainable parameters:** 5.64M (0.45% of base model)  
- **Hardware:** NVIDIA T4 (fp16)  

⚠️ **Note:** This repo contains only the **LoRA adapter weights**. You need access to the base model from Meta to use it.

---

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE = "meta-llama/Llama-3.2-1B-Instruct"
ADAPTER = "streetelite/mnli-llama3.2-1b-qlora-10k"

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForSequenceClassification.from_pretrained(
    BASE,
    num_labels=3,
    quantization_config=bnb,
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, ADAPTER).eval()

inputs = tokenizer(
    "A man is playing guitar.",
    "A person is making music.",
    return_tensors="pt",
    truncation=True
)
with torch.inference_mode():
    logits = model(**{k: v.to(model.device) for k, v in inputs.items()}).logits
    probs = logits.softmax(-1)
print(probs)
```

## Results

### GLUE dev (official)

| Set        | Accuracy | F1 (macro) | F1 (weighted) | MCC    | Kappa  | MAE    |
|------------|----------|------------|---------------|--------|--------|--------|
| Matched    | 82.37%   | 0.8210     | 0.8224         | 0.7358 | 0.7349 | 0.2068 |
| Mismatched | 83.71%   | 0.8348     | 0.8360         | 0.7558 | 0.7550 | 0.1894 |

---

### Held-out test split (1k from train)

| Accuracy | F1 (macro) | F1 (weighted) | MCC    | Kappa  | MAE    |
|----------|------------|---------------|--------|--------|--------|
| 83.10%   | 0.8280     | 0.8288         | 0.7496 | 0.7461 | 0.2010 |

---

## Training Details

- **Framework:** Hugging Face Transformers + PEFT + bitsandbytes  
- **Quantization:** 4-bit NF4 w/ double quantization  
- **LoRA config:** r=8, alpha=16, dropout=0.1, target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`  
- **Optimizer:** paged_adamw_8bit, lr=2e-4  
- **Batch size:** 4 (gradient accumulation = 4 → effective 16)  
- **Epochs:** 2  
- **Seed:** 42  
- **Padding:** dynamic  

---

## Intended Uses

- **Primary:** Natural language inference on text pairs (entailment, neutral, contradiction).  
- **Languages:** English.  
- **Not intended for:** non-English inputs, factual question answering, safety-critical applications without human review.  

---

## License

- **Base model:** LLaMA 3.2 1B Instruct — [Meta license](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)  
- **Adapter:** Apache License 2.0