--- license: apache-2.0 base_model: tiiuae/falcon-7b-instruct tags: - generated_from_trainer - falcon - lora - direct-preference-optimization-(custom-dpo-with-proper-loss) datasets: - HuggingFaceH4/ultrachat_200k - HuggingFaceH4/ultrafeedback_binarized metrics: - loss library_name: transformers model-index: - name: falcon-7b-custom-dpo-lora-lablebox results: [] --- # falcon-7b-custom-dpo-lora-lablebox This model is a fine-tuned version of [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) using Direct Preference Optimization (Custom DPO with Proper Loss). ## Model Description - **Training Method**: Direct Preference Optimization (Custom DPO with Proper Loss) - **Base Model**: Falcon-7B-Instruct - **Parameter Count**: 6.92B (base model) - **LoRA Parameters**: 0.0085% trainable - **Hardware**: Apple Silicon Mac (128GB RAM) - **Framework**: PyTorch with MPS backend ## Training Results - **Runtime**: 38.15 minutes - **Steps**: 150 optimizer steps (1200 forward passes) - **Loss Reduction**: 98.97% - **Benchmark Quality Score**: 1.00/1.00 ## Training Configuration ### LoRA Configuration - Rank (r): 2 - Alpha: 4 - Target Modules: query_key_value - Dropout: 0.1 ### Training Parameters - Learning Rate: 5e-5 - Gradient Accumulation: 8 steps - Mixed Precision: FP16 - Scheduler: Cosine Annealing ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model base_model = AutoModelForCausalLM.from_pretrained( "tiiuae/falcon-7b-instruct", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "falcon-7b-custom-dpo-lora-lablebox") # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("falcon-7b-custom-dpo-lora-lablebox") # Generate text prompt = "What is machine learning?" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=100) response = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Training Details This model was trained as part of the Lablebox Take Home Assignment, demonstrating gradient-based training of large language models on consumer hardware. ### Framework versions - Transformers 4.44.2 - PyTorch 2.5.0.dev20240912 - PEFT 0.13.0 - Datasets 3.0.0 - Tokenizers 0.19.1