--- library_name: transformers license: apache-2.0 base_model: alignment-handbook/zephyr-7b-sft-full tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: zephyr-7b-dpo-full-alpha_0.5_batch32 results: [] --- # zephyr-7b-dpo-full-alpha_0.5_batch32 This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.7742 - Rewards/chosen: -1.3339 - Rewards/rejected: -2.3724 - Rewards/accuracies: 0.7738 - Rewards/margins: 1.0385 - Logps/rejected: -497.4416 - Logps/chosen: -415.3672 - Logits/rejected: 1.0737 - Logits/chosen: -0.2206 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - total_train_batch_size: 32 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:------:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.9799 | 0.0523 | 100 | -2.6053 | -2.5499 | -279.5893 | -262.4359 | 0.9798 | 0.7004 | 0.0239 | 0.0462 | -0.0223 | | 0.9062 | 0.1047 | 200 | -2.0394 | -1.8739 | -369.8690 | -384.9631 | 0.8992 | 0.7044 | -0.8789 | 0.3687 | -1.2476 | | 0.8625 | 0.1570 | 300 | -0.7556 | -0.4321 | -362.7793 | -402.3089 | 0.9493 | 0.7401 | -0.8080 | 0.6130 | -1.4211 | | 0.8393 | 0.2093 | 400 | 0.4949 | 1.2281 | -403.6841 | -443.1324 | 0.8450 | 0.7381 | -1.2171 | 0.6122 | -1.8293 | | 0.7936 | 0.2616 | 500 | 0.4250 | 1.2276 | -439.1208 | -499.2642 | 0.8415 | 0.7381 | -1.5714 | 0.8192 | -2.3906 | | 0.8417 | 0.3140 | 600 | 0.5051 | 1.2799 | -390.0083 | -451.7932 | 0.8105 | 0.7599 | -1.0803 | 0.8356 | -1.9159 | | 0.7909 | 0.3663 | 700 | 0.4495 | 1.4415 | -390.6981 | -460.0458 | 0.8043 | 0.75 | -1.0872 | 0.9112 | -1.9984 | | 0.8545 | 0.4186 | 800 | 0.9682 | 1.7931 | -463.3611 | -521.2473 | 0.8065 | 0.7560 | -1.8138 | 0.7966 | -2.6104 | | 0.7903 | 0.4710 | 900 | 0.8040 | -0.8990 | -1.7305 | 0.7579 | 0.8316 | -433.2554 | -371.8721 | -0.4830 | -1.1343 | | 0.7805 | 0.5233 | 1000 | 0.7874 | -1.5900 | -2.5776 | 0.7738 | 0.9876 | -517.9631 | -440.9751 | 1.2055 | 0.1112 | | 0.7927 | 0.5756 | 1100 | 0.7853 | -1.4465 | -2.4111 | 0.7579 | 0.9646 | -501.3155 | -426.6308 | 0.3121 | -0.7289 | | 0.7714 | 0.6279 | 1200 | 0.7814 | -1.2916 | -2.3130 | 0.7679 | 1.0213 | -491.5005 | -411.1409 | 0.3216 | -0.8372 | | 0.7514 | 0.6803 | 1300 | 0.7838 | -1.2233 | -2.2459 | 0.7718 | 1.0226 | -484.7902 | -404.3044 | 0.5839 | -0.7223 | | 0.7356 | 0.7326 | 1400 | 0.7767 | -1.4917 | -2.5388 | 0.7698 | 1.0471 | -514.0866 | -431.1516 | 1.2245 | -0.1178 | | 0.7475 | 0.7849 | 1500 | 0.7756 | -1.3568 | -2.3583 | 0.7639 | 1.0016 | -496.0364 | -417.6552 | 1.0529 | -0.2127 | | 0.7625 | 0.8373 | 1600 | 0.7751 | -1.2270 | -2.2379 | 0.7778 | 1.0108 | -483.9888 | -404.6796 | 0.7870 | -0.4206 | | 0.7493 | 0.8896 | 1700 | 0.7748 | -1.3789 | -2.4259 | 0.7718 | 1.0470 | -502.7920 | -419.8630 | 1.2325 | -0.0891 | | 0.7604 | 0.9419 | 1800 | 0.7743 | -1.3323 | -2.3703 | 0.7718 | 1.0381 | -497.2375 | -415.2034 | 1.0742 | -0.2202 | | 0.7654 | 0.9942 | 1900 | 0.7743 | -1.3347 | -2.3725 | 0.7718 | 1.0378 | -497.4510 | -415.4467 | 1.0747 | -0.2184 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.2.1+cu118 - Datasets 2.14.7 - Tokenizers 0.19.1