Model Card for Model ID

Model Details

Model Description

This model tunes Meta Llama 3.2 3B-Instruct using the Direct Preference Optimization (DPO) objective with a LoRA adapter configuration.
The goal is to align the base model’s responses with human preference data aligned to my personal persona and values.

  • Developed by: Kazunori Fukuhara (@Kazchoko)
    Institution: Stanford Graduate School of Education / AI Tinkery
    License: Follows the base model’s license (Meta Llama 3 Community License)
    Languages: English
    Finetuned From: meta-llama/Llama-3.2-3B-Instruct
    Frameworks: transformers, trl, peft, bitsandbytes, torch
    Model Type: Causal Language Model (Instruction-tuned + DPO LoRA)

Model Sources

Uses

Only intended use is for the HW1 in CS329X

Bias, Risks, and Limitations

  • DPO alignment depends on the balance of preference data, which are selected examples by personal preference that can amplify biases.
  • Model inherits the limitations of Llama 3.2, including occasional hallucination and verbosity.
  • As the dataset targeted social/ethical prompts (e.g., “What do you think of the death penalty?”), responses may reflect the implicit values of training data rather than objective neutrality.

Training Details

Training Data

The model was fine-tuned on a preference dataset (PREFERENCE_DATA) selected by Kazunori Fukuhara

Training Procedure

Fine-tuning was performed using TRL’s DPOTrainer and PEFT’s LoRA adapters with 4-bit quantization.

Training Hyperparameters

Parameter Value
Base Model meta-llama/Llama-3.2-3B-Instruct
Learning Rate 1 × 10⁻⁵
Batch Size 2
Epochs 3
LoRA r 8
LoRA α 16
LoRA Dropout 0.1
Precision bfloat16
Quantization 4-bit NF4
Optimizer default (AdamW)
Trainer TRL DPOTrainer

Compute Details

Hardware Environment
A100 / T4 GPU (16–40 GB) Google Colab Pro
Framework Versions transformers 4.44+, trl 0.8+, peft 0.10.0, bitsandbytes 0.43+, torch 2.3+
Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kazchoko/llama3.2-3b-kaz-dpo-lora

Adapter
(478)
this model