strectelite
/

mnli-llama3.2-1b-qlora-10k

Text Classification

Model card Files Files and versions

mnli-llama3.2-1b-qlora-10k / README.md

strectelite's picture

Update README.md

f6241fe verified 4 months ago

|

history blame contribute delete

3.68 kB

	---
	base_model: meta-llama/Llama-3.2-1B-Instruct
	library_name: peft
	tags:
	- base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
	- lora
	- transformers
	license: apache-2.0
	datasets:
	- nyu-mll/glue
	language:
	- en
	metrics:
	- accuracy
	- f1
	- matthews_correlation
	pipeline_tag: text-classification
	---
	# MNLI - LLaMA 3.2 1B - QLoRA (10k subset, 4-bit)

	## Model Summary
	This is a QLoRA fine-tuned version of [`meta-llama/Llama-3.2-1B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the MNLI (Multi-Genre Natural Language Inference) dataset from [GLUE](https://huggingface.co/datasets/glue/viewer/mnli).

	- Base model: LLaMA 3.2 1B Instruct
	- Fine-tuning method: [QLoRA](https://arxiv.org/abs/2305.14314) with 4-bit quantization
	- Train subset: 10k samples (8k train / 1k val / 1k test from train split)
	- Evaluation: Official GLUE dev sets (matched / mismatched) + held-out 1k test split
	- Trainable parameters: 5.64M (0.45% of base model)
	- Hardware: NVIDIA T4 (fp16)

	⚠️ Note: This repo contains only the LoRA adapter weights. You need access to the base model from Meta to use it.

	---

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
	from peft import PeftModel
	import torch

	BASE = "meta-llama/Llama-3.2-1B-Instruct"
	ADAPTER = "streetelite/mnli-llama3.2-1b-qlora-10k"

	bnb = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True
	)

	tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
	model = AutoModelForSequenceClassification.from_pretrained(
	BASE,
	num_labels=3,
	quantization_config=bnb,
	torch_dtype=torch.float16,
	device_map="auto"
	)
	model = PeftModel.from_pretrained(model, ADAPTER).eval()

	inputs = tokenizer(
	"A man is playing guitar.",
	"A person is making music.",
	return_tensors="pt",
	truncation=True
	)
	with torch.inference_mode():
	logits = model(**{k: v.to(model.device) for k, v in inputs.items()}).logits
	probs = logits.softmax(-1)
	print(probs)
	```

	## Results

	### GLUE dev (official)

	\| Set \| Accuracy \| F1 (macro) \| F1 (weighted) \| MCC \| Kappa \| MAE \|
	\|------------\|----------\|------------\|---------------\|--------\|--------\|--------\|
	\| Matched \| 82.37% \| 0.8210 \| 0.8224 \| 0.7358 \| 0.7349 \| 0.2068 \|
	\| Mismatched \| 83.71% \| 0.8348 \| 0.8360 \| 0.7558 \| 0.7550 \| 0.1894 \|

	---

	### Held-out test split (1k from train)

	\| Accuracy \| F1 (macro) \| F1 (weighted) \| MCC \| Kappa \| MAE \|
	\|----------\|------------\|---------------\|--------\|--------\|--------\|
	\| 83.10% \| 0.8280 \| 0.8288 \| 0.7496 \| 0.7461 \| 0.2010 \|

	---

	## Training Details

	- Framework: Hugging Face Transformers + PEFT + bitsandbytes
	- Quantization: 4-bit NF4 w/ double quantization
	- LoRA config: r=8, alpha=16, dropout=0.1, target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
	- Optimizer: paged_adamw_8bit, lr=2e-4
	- Batch size: 4 (gradient accumulation = 4 → effective 16)
	- Epochs: 2
	- Seed: 42
	- Padding: dynamic

	---

	## Intended Uses

	- Primary: Natural language inference on text pairs (entailment, neutral, contradiction).
	- Languages: English.
	- Not intended for: non-English inputs, factual question answering, safety-critical applications without human review.

	---

	## License

	- Base model: LLaMA 3.2 1B Instruct — [Meta license](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
	- Adapter: Apache License 2.0