q-a / README.md
sujal7102003's picture
Upload folder using huggingface_hub
41050b6 verified
metadata
license: mit
tags:
  - language-model
  - instruction-tuning
  - lora
  - adalora
  - qlora
  - tinyllama
  - text-generation

πŸ¦™ TinyLlama Instruction-Tuned Models: LoRA, AdaLoRA, QLoRA

This repo hosts a set of TinyLlama 1.1B models fine-tuned using various parameter-efficient methods:

  • βœ… LoRA (Low-Rank Adaptation)
  • βœ… AdaLoRA (Adaptive Low-Rank Adaptation with rank scheduling)
  • βœ… QLoRA (Quantized LoRA for low-memory environments)

These models are fine-tuned on a custom instruction-response dataset for general-purpose instruction-following.


πŸ“¦ Model Variants

Name Folder Name Method Notes
LoRA lora-tinyllama-final LoRA Standard fine-tuned model
AdaLoRA adalora-tinyllama-final AdaLoRA Rank-adaptive LoRA
QLoRA qlora-tinyllama-final QLoRA Quantized LoRA (int4)

🧠 Base Model


πŸš€ Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lora_dir = "lora-tinyllama-final"  # or use "adalora-tinyllama-final", "qlora-tinyllama-final"

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(lora_dir)
tokenizer.pad_token = tokenizer.eos_token

# Load model + adapter
base = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(base, lora_dir)
model = model.merge_and_unload()
model.eval()

def ask(prompt):
    prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model.generate(**inputs, max_new_tokens=150, temperature=0.7, top_p=0.9, do_sample=True)
    return tokenizer.decode(output[0], skip_special_tokens=True).split("### Response:")[-1].strip()

print(ask("What is your name?"))