Model Card for Model ID

Model Details

Model Description

This model tunes Meta Llama 3.2 3B-Instruct using the Direct Preference Optimization (DPO) objective with a LoRA adapter configuration.
The goal is to align the base model’s responses with human preference data aligned to my personal persona and values.

Developed by: Kazunori Fukuhara (@Kazchoko)
Institution: Stanford Graduate School of Education / AI Tinkery
License: Follows the base model’s license (Meta Llama 3 Community License)
Languages: English
Finetuned From: meta-llama/Llama-3.2-3B-Instruct
Frameworks: transformers, trl, peft, bitsandbytes, torch
Model Type: Causal Language Model (Instruction-tuned + DPO LoRA)

Model Sources

Repository: https://huggingface.co/Kazchoko/llama3.2-3b-dpo-kaz-lora
Base Model: meta-llama/Llama-3.2-3B-Instruct

Uses

Only intended use is for the HW1 in CS329X

Bias, Risks, and Limitations

DPO alignment depends on the balance of preference data, which are selected examples by personal preference that can amplify biases.
Model inherits the limitations of Llama 3.2, including occasional hallucination and verbosity.
As the dataset targeted social/ethical prompts (e.g., “What do you think of the death penalty?”), responses may reflect the implicit values of training data rather than objective neutrality.

Training Details

Training Data

The model was fine-tuned on a preference dataset (PREFERENCE_DATA) selected by Kazunori Fukuhara

Training Procedure

Fine-tuning was performed using TRL’s DPOTrainer and PEFT’s LoRA adapters with 4-bit quantization.

Training Hyperparameters

Parameter	Value
Base Model	`meta-llama/Llama-3.2-3B-Instruct`
Learning Rate	1 × 10⁻⁵
Batch Size	2
Epochs	3
LoRA r	8
LoRA α	16
LoRA Dropout	0.1
Precision	bfloat16
Quantization	4-bit NF4
Optimizer	default (AdamW)
Trainer	TRL `DPOTrainer`

Compute Details

Hardware	Environment
A100 / T4 GPU (16–40 GB)	Google Colab Pro
Framework Versions	`transformers 4.44+`, `trl 0.8+`, `peft 0.10.0`, `bitsandbytes 0.43+`, `torch 2.3+`

Downloads last month: 28

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kazchoko/llama3.2-3b-kaz-dpo-lora

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(478)

this model