zephyr-7b-dpo-full-alpha_0.5_batch32

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7742
  • Rewards/chosen: -1.3339
  • Rewards/rejected: -2.3724
  • Rewards/accuracies: 0.7738
  • Rewards/margins: 1.0385
  • Logps/rejected: -497.4416
  • Logps/chosen: -415.3672
  • Logits/rejected: 1.0737
  • Logits/chosen: -0.2206

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.9799 0.0523 100 -2.6053 -2.5499 -279.5893 -262.4359 0.9798 0.7004 0.0239 0.0462 -0.0223
0.9062 0.1047 200 -2.0394 -1.8739 -369.8690 -384.9631 0.8992 0.7044 -0.8789 0.3687 -1.2476
0.8625 0.1570 300 -0.7556 -0.4321 -362.7793 -402.3089 0.9493 0.7401 -0.8080 0.6130 -1.4211
0.8393 0.2093 400 0.4949 1.2281 -403.6841 -443.1324 0.8450 0.7381 -1.2171 0.6122 -1.8293
0.7936 0.2616 500 0.4250 1.2276 -439.1208 -499.2642 0.8415 0.7381 -1.5714 0.8192 -2.3906
0.8417 0.3140 600 0.5051 1.2799 -390.0083 -451.7932 0.8105 0.7599 -1.0803 0.8356 -1.9159
0.7909 0.3663 700 0.4495 1.4415 -390.6981 -460.0458 0.8043 0.75 -1.0872 0.9112 -1.9984
0.8545 0.4186 800 0.9682 1.7931 -463.3611 -521.2473 0.8065 0.7560 -1.8138 0.7966 -2.6104
0.7903 0.4710 900 0.8040 -0.8990 -1.7305 0.7579 0.8316 -433.2554 -371.8721 -0.4830 -1.1343
0.7805 0.5233 1000 0.7874 -1.5900 -2.5776 0.7738 0.9876 -517.9631 -440.9751 1.2055 0.1112
0.7927 0.5756 1100 0.7853 -1.4465 -2.4111 0.7579 0.9646 -501.3155 -426.6308 0.3121 -0.7289
0.7714 0.6279 1200 0.7814 -1.2916 -2.3130 0.7679 1.0213 -491.5005 -411.1409 0.3216 -0.8372
0.7514 0.6803 1300 0.7838 -1.2233 -2.2459 0.7718 1.0226 -484.7902 -404.3044 0.5839 -0.7223
0.7356 0.7326 1400 0.7767 -1.4917 -2.5388 0.7698 1.0471 -514.0866 -431.1516 1.2245 -0.1178
0.7475 0.7849 1500 0.7756 -1.3568 -2.3583 0.7639 1.0016 -496.0364 -417.6552 1.0529 -0.2127
0.7625 0.8373 1600 0.7751 -1.2270 -2.2379 0.7778 1.0108 -483.9888 -404.6796 0.7870 -0.4206
0.7493 0.8896 1700 0.7748 -1.3789 -2.4259 0.7718 1.0470 -502.7920 -419.8630 1.2325 -0.0891
0.7604 0.9419 1800 0.7743 -1.3323 -2.3703 0.7718 1.0381 -497.2375 -415.2034 1.0742 -0.2202
0.7654 0.9942 1900 0.7743 -1.3347 -2.3725 0.7718 1.0378 -497.4510 -415.4467 1.0747 -0.2184

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.1+cu118
  • Datasets 2.14.7
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch32

Finetuned
(363)
this model

Dataset used to train YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch32