---
license: mit
tags:
  - language-model
  - instruction-tuning
  - lora
  - adalora
  - qlora
  - tinyllama
  - text-generation
---

# 🦙 TinyLlama Instruction-Tuned Models: LoRA, AdaLoRA, QLoRA

This repo hosts a set of TinyLlama 1.1B models fine-tuned using various parameter-efficient methods:
- ✅ **LoRA** (Low-Rank Adaptation)
- ✅ **AdaLoRA** (Adaptive Low-Rank Adaptation with rank scheduling)
- ✅ **QLoRA** (Quantized LoRA for low-memory environments)

These models are fine-tuned on a custom instruction-response dataset for general-purpose instruction-following.

---

## 📦 Model Variants

| Name        | Folder Name                 | Method   | Notes                     |
|-------------|-----------------------------|----------|---------------------------|
| LoRA        | `lora-tinyllama-final`      | LoRA     | Standard fine-tuned model |
| AdaLoRA     | `adalora-tinyllama-final`   | AdaLoRA  | Rank-adaptive LoRA        |
| QLoRA       | `qlora-tinyllama-final`     | QLoRA    | Quantized LoRA (int4)     |

---

## 🧠 Base Model

- **Base**: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- **Tokenizer**: SentencePiece with `eos_token` padding

---

## 🚀 Inference Example

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lora_dir = "lora-tinyllama-final"  # or use "adalora-tinyllama-final", "qlora-tinyllama-final"

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(lora_dir)
tokenizer.pad_token = tokenizer.eos_token

# Load model + adapter
base = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(base, lora_dir)
model = model.merge_and_unload()
model.eval()

def ask(prompt):
    prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model.generate(**inputs, max_new_tokens=150, temperature=0.7, top_p=0.9, do_sample=True)
    return tokenizer.decode(output[0], skip_special_tokens=True).split("### Response:")[-1].strip()

print(ask("What is your name?"))