Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - Merged Model

Full-precision (16-bit) merged model with LoRA adapters integrated.

Model Details

Related Models

Prompt Format

This model uses the Llama 3.2 chat template.

Python Usage

Use the tokenizer's apply_chat_template() method:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Your question here"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")

Training Details

  • LoRA Rank: 32
  • Training Steps: 1870
  • Training Loss: 0.7500
  • Max Seq Length: 2048
  • Training Scope: 7,473 samples (2 epoch(s), full dataset)

For complete training configuration, see the LoRA adapters repository/directory.

Benchmark Results

Benchmarked on the merged 16-bit safetensor model

Evaluated: 2025-11-24 14:29

Model Type gsm8k
unsloth/Llama-3.2-1B-Instruct-bnb-4bit Base 0.1463
Llama-3.2-1B-Instruct-bnb-4bit-gsm8k Fine-tuned 0.3230

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./outputs/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k/merged_16bit",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./outputs/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k/merged_16bit")

messages = [{"role": "user", "content": "Your question here"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

License

Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on openai/gsm8k. Please refer to the original model and dataset licenses.

Credits

Trained by: Your Name

Training pipeline:

Base components:

Downloads last month
13
Safetensors
Model size
1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k