falcon-7b-custom-dpo-lora-lablebox
This model is a fine-tuned version of tiiuae/falcon-7b-instruct using Direct Preference Optimization (Custom DPO with Proper Loss).
Model Description
- Training Method: Direct Preference Optimization (Custom DPO with Proper Loss)
- Base Model: Falcon-7B-Instruct
- Parameter Count: 6.92B (base model)
- LoRA Parameters: 0.0085% trainable
- Hardware: Apple Silicon Mac (128GB RAM)
- Framework: PyTorch with MPS backend
Training Results
- Runtime: 38.15 minutes
- Steps: 150 optimizer steps (1200 forward passes)
- Loss Reduction: 98.97%
- Benchmark Quality Score: 1.00/1.00
Training Configuration
LoRA Configuration
- Rank (r): 2
- Alpha: 4
- Target Modules: query_key_value
- Dropout: 0.1
Training Parameters
- Learning Rate: 5e-5
- Gradient Accumulation: 8 steps
- Mixed Precision: FP16
- Scheduler: Cosine Annealing
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"tiiuae/falcon-7b-instruct",
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "falcon-7b-custom-dpo-lora-lablebox")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("falcon-7b-custom-dpo-lora-lablebox")
# Generate text
prompt = "What is machine learning?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Training Details
This model was trained as part of the Lablebox Take Home Assignment, demonstrating gradient-based training of large language models on consumer hardware.
Framework versions
- Transformers 4.44.2
- PyTorch 2.5.0.dev20240912
- PEFT 0.13.0
- Datasets 3.0.0
- Tokenizers 0.19.1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for abhinav302019/falcon-7b-custom-dpo-lora-lablebox
Base model
tiiuae/falcon-7b-instruct