strectelite's picture
Update README.md
f6241fe verified
metadata
base_model: meta-llama/Llama-3.2-1B-Instruct
library_name: peft
tags:
  - base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
  - lora
  - transformers
license: apache-2.0
datasets:
  - nyu-mll/glue
language:
  - en
metrics:
  - accuracy
  - f1
  - matthews_correlation
pipeline_tag: text-classification

MNLI - LLaMA 3.2 1B - QLoRA (10k subset, 4-bit)

Model Summary

This is a QLoRA fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the MNLI (Multi-Genre Natural Language Inference) dataset from GLUE.

  • Base model: LLaMA 3.2 1B Instruct
  • Fine-tuning method: QLoRA with 4-bit quantization
  • Train subset: 10k samples (8k train / 1k val / 1k test from train split)
  • Evaluation: Official GLUE dev sets (matched / mismatched) + held-out 1k test split
  • Trainable parameters: 5.64M (0.45% of base model)
  • Hardware: NVIDIA T4 (fp16)

⚠️ Note: This repo contains only the LoRA adapter weights. You need access to the base model from Meta to use it.


Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE = "meta-llama/Llama-3.2-1B-Instruct"
ADAPTER = "streetelite/mnli-llama3.2-1b-qlora-10k"

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForSequenceClassification.from_pretrained(
    BASE,
    num_labels=3,
    quantization_config=bnb,
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, ADAPTER).eval()

inputs = tokenizer(
    "A man is playing guitar.",
    "A person is making music.",
    return_tensors="pt",
    truncation=True
)
with torch.inference_mode():
    logits = model(**{k: v.to(model.device) for k, v in inputs.items()}).logits
    probs = logits.softmax(-1)
print(probs)

Results

GLUE dev (official)

Set Accuracy F1 (macro) F1 (weighted) MCC Kappa MAE
Matched 82.37% 0.8210 0.8224 0.7358 0.7349 0.2068
Mismatched 83.71% 0.8348 0.8360 0.7558 0.7550 0.1894

Held-out test split (1k from train)

Accuracy F1 (macro) F1 (weighted) MCC Kappa MAE
83.10% 0.8280 0.8288 0.7496 0.7461 0.2010

Training Details

  • Framework: Hugging Face Transformers + PEFT + bitsandbytes
  • Quantization: 4-bit NF4 w/ double quantization
  • LoRA config: r=8, alpha=16, dropout=0.1, target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Optimizer: paged_adamw_8bit, lr=2e-4
  • Batch size: 4 (gradient accumulation = 4 → effective 16)
  • Epochs: 2
  • Seed: 42
  • Padding: dynamic

Intended Uses

  • Primary: Natural language inference on text pairs (entailment, neutral, contradiction).
  • Languages: English.
  • Not intended for: non-English inputs, factual question answering, safety-critical applications without human review.

License

  • Base model: LLaMA 3.2 1B Instruct — Meta license
  • Adapter: Apache License 2.0