zephyr-7b-dpo-full-alpha_0.5_batch32
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.7742
- Rewards/chosen: -1.3339
- Rewards/rejected: -2.3724
- Rewards/accuracies: 0.7738
- Rewards/margins: 1.0385
- Logps/rejected: -497.4416
- Logps/chosen: -415.3672
- Logits/rejected: 1.0737
- Logits/chosen: -0.2206
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.9799 | 0.0523 | 100 | -2.6053 | -2.5499 | -279.5893 | -262.4359 | 0.9798 | 0.7004 | 0.0239 | 0.0462 | -0.0223 |
| 0.9062 | 0.1047 | 200 | -2.0394 | -1.8739 | -369.8690 | -384.9631 | 0.8992 | 0.7044 | -0.8789 | 0.3687 | -1.2476 |
| 0.8625 | 0.1570 | 300 | -0.7556 | -0.4321 | -362.7793 | -402.3089 | 0.9493 | 0.7401 | -0.8080 | 0.6130 | -1.4211 |
| 0.8393 | 0.2093 | 400 | 0.4949 | 1.2281 | -403.6841 | -443.1324 | 0.8450 | 0.7381 | -1.2171 | 0.6122 | -1.8293 |
| 0.7936 | 0.2616 | 500 | 0.4250 | 1.2276 | -439.1208 | -499.2642 | 0.8415 | 0.7381 | -1.5714 | 0.8192 | -2.3906 |
| 0.8417 | 0.3140 | 600 | 0.5051 | 1.2799 | -390.0083 | -451.7932 | 0.8105 | 0.7599 | -1.0803 | 0.8356 | -1.9159 |
| 0.7909 | 0.3663 | 700 | 0.4495 | 1.4415 | -390.6981 | -460.0458 | 0.8043 | 0.75 | -1.0872 | 0.9112 | -1.9984 |
| 0.8545 | 0.4186 | 800 | 0.9682 | 1.7931 | -463.3611 | -521.2473 | 0.8065 | 0.7560 | -1.8138 | 0.7966 | -2.6104 |
| 0.7903 | 0.4710 | 900 | 0.8040 | -0.8990 | -1.7305 | 0.7579 | 0.8316 | -433.2554 | -371.8721 | -0.4830 | -1.1343 |
| 0.7805 | 0.5233 | 1000 | 0.7874 | -1.5900 | -2.5776 | 0.7738 | 0.9876 | -517.9631 | -440.9751 | 1.2055 | 0.1112 |
| 0.7927 | 0.5756 | 1100 | 0.7853 | -1.4465 | -2.4111 | 0.7579 | 0.9646 | -501.3155 | -426.6308 | 0.3121 | -0.7289 |
| 0.7714 | 0.6279 | 1200 | 0.7814 | -1.2916 | -2.3130 | 0.7679 | 1.0213 | -491.5005 | -411.1409 | 0.3216 | -0.8372 |
| 0.7514 | 0.6803 | 1300 | 0.7838 | -1.2233 | -2.2459 | 0.7718 | 1.0226 | -484.7902 | -404.3044 | 0.5839 | -0.7223 |
| 0.7356 | 0.7326 | 1400 | 0.7767 | -1.4917 | -2.5388 | 0.7698 | 1.0471 | -514.0866 | -431.1516 | 1.2245 | -0.1178 |
| 0.7475 | 0.7849 | 1500 | 0.7756 | -1.3568 | -2.3583 | 0.7639 | 1.0016 | -496.0364 | -417.6552 | 1.0529 | -0.2127 |
| 0.7625 | 0.8373 | 1600 | 0.7751 | -1.2270 | -2.2379 | 0.7778 | 1.0108 | -483.9888 | -404.6796 | 0.7870 | -0.4206 |
| 0.7493 | 0.8896 | 1700 | 0.7748 | -1.3789 | -2.4259 | 0.7718 | 1.0470 | -502.7920 | -419.8630 | 1.2325 | -0.0891 |
| 0.7604 | 0.9419 | 1800 | 0.7743 | -1.3323 | -2.3703 | 0.7718 | 1.0381 | -497.2375 | -415.2034 | 1.0742 | -0.2202 |
| 0.7654 | 0.9942 | 1900 | 0.7743 | -1.3347 | -2.3725 | 0.7718 | 1.0378 | -497.4510 | -415.4467 | 1.0747 | -0.2184 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.2.1+cu118
- Datasets 2.14.7
- Tokenizers 0.19.1
- Downloads last month
- 9
Model tree for YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch32
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full