genv3pair1NoGT_1.5B_cdpo_ebs32_lr5e-06_beta0.1_epoch8.0_42
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:
- Loss: 1.2478
- Rewards/chosen: -0.2745
- Rewards/rejected: 0.0
- Rewards/accuracies: 0.4250
- Rewards/margins: -0.2745
- Logps/rejected: -53.9242
- Logps/chosen: -32.8150
- Logits/rejected: -3.5353
- Logits/chosen: -3.4027
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 8.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6339 | 0.1117 | 20 | 0.6293 | 0.1239 | 0.0 | 1.0 | 0.1239 | -40.1835 | -28.8307 | -2.3134 | -2.4461 |
| 0.4231 | 0.2235 | 40 | 0.4187 | 0.7203 | 0.0 | 0.9500 | 0.7203 | -33.8767 | -22.8669 | -3.0016 | -3.0384 |
| 0.3949 | 0.3352 | 60 | 0.3773 | 0.9640 | 0.0 | 0.9750 | 0.9640 | -32.2958 | -20.4299 | -3.3011 | -3.2774 |
| 0.3546 | 0.4469 | 80 | 0.3637 | 0.9914 | 0.0 | 1.0 | 0.9914 | -31.6176 | -20.1563 | -3.3013 | -3.2770 |
| 0.3059 | 0.5587 | 100 | 0.3664 | 1.0027 | 0.0 | 0.9750 | 1.0027 | -31.7914 | -20.0426 | -3.3690 | -3.3327 |
| 0.3504 | 0.6704 | 120 | 0.3799 | 0.9659 | 0.0 | 1.0 | 0.9659 | -32.6139 | -20.4109 | -3.3379 | -3.3099 |
| 0.4161 | 0.7821 | 140 | 0.4016 | 0.8722 | 0.0 | 0.9750 | 0.8722 | -33.4282 | -21.3484 | -3.2793 | -3.2535 |
| 0.4314 | 0.8939 | 160 | 0.4172 | 0.8576 | 0.0 | 0.9750 | 0.8576 | -33.2324 | -21.4936 | -3.2940 | -3.2712 |
| 0.2584 | 1.0056 | 180 | 0.4375 | 0.7763 | 0.0 | 0.9250 | 0.7763 | -33.9398 | -22.3070 | -3.2443 | -3.2321 |
| 0.2477 | 1.1173 | 200 | 0.5238 | 0.7154 | 0.0 | 0.8000 | 0.7154 | -36.2846 | -22.9161 | -3.4659 | -3.4084 |
| 0.3476 | 1.2291 | 220 | 0.4947 | 0.7315 | 0.0 | 0.875 | 0.7315 | -36.0092 | -22.7547 | -3.4213 | -3.3765 |
| 0.2478 | 1.3408 | 240 | 0.5380 | 0.7329 | 0.0 | 0.875 | 0.7329 | -37.9193 | -22.7412 | -3.4929 | -3.4322 |
| 0.3341 | 1.4525 | 260 | 0.5285 | 0.7788 | 0.0 | 0.9000 | 0.7788 | -36.5301 | -22.2823 | -3.4798 | -3.4204 |
| 0.2892 | 1.5642 | 280 | 0.5287 | 0.7061 | 0.0 | 0.9250 | 0.7061 | -36.8482 | -23.0091 | -3.3743 | -3.3223 |
| 0.2768 | 1.6760 | 300 | 0.5382 | 0.6573 | 0.0 | 0.8500 | 0.6573 | -37.3328 | -23.4967 | -3.4178 | -3.3609 |
| 0.3726 | 1.7877 | 320 | 0.5422 | 0.6608 | 0.0 | 0.8500 | 0.6608 | -35.9154 | -23.4616 | -3.3283 | -3.3046 |
| 0.3073 | 1.8994 | 340 | 0.5740 | 0.6415 | 0.0 | 0.8250 | 0.6415 | -36.9027 | -23.6551 | -3.4064 | -3.3511 |
| 0.1934 | 2.0112 | 360 | 0.5502 | 0.7417 | 0.0 | 0.9000 | 0.7417 | -36.7407 | -22.6530 | -3.3965 | -3.3432 |
| 0.2036 | 2.1229 | 380 | 0.6724 | 0.6468 | 0.0 | 0.8500 | 0.6468 | -39.6545 | -23.6024 | -3.5622 | -3.4664 |
| 0.1782 | 2.2346 | 400 | 0.6572 | 0.5434 | 0.0 | 0.8500 | 0.5434 | -38.8220 | -24.6361 | -3.5365 | -3.4469 |
| 0.2196 | 2.3464 | 420 | 0.6808 | 0.4513 | 0.0 | 0.75 | 0.4513 | -40.6678 | -25.5571 | -3.5865 | -3.4942 |
| 0.1398 | 2.4581 | 440 | 0.7209 | 0.3850 | 0.0 | 0.75 | 0.3850 | -39.6323 | -26.2196 | -3.5268 | -3.4407 |
| 0.1952 | 2.5698 | 460 | 0.7184 | 0.4272 | 0.0 | 0.75 | 0.4272 | -39.7520 | -25.7979 | -3.5159 | -3.4299 |
| 0.3116 | 2.6816 | 480 | 0.6880 | 0.5085 | 0.0 | 0.7750 | 0.5085 | -39.4111 | -24.9850 | -3.5145 | -3.4305 |
| 0.1611 | 2.7933 | 500 | 0.7117 | 0.4137 | 0.0 | 0.75 | 0.4137 | -41.1864 | -25.9328 | -3.4832 | -3.4002 |
| 0.1948 | 2.9050 | 520 | 0.6730 | 0.5021 | 0.0 | 0.875 | 0.5021 | -39.4969 | -25.0492 | -3.4868 | -3.4069 |
| 0.1743 | 3.0168 | 540 | 0.6875 | 0.3881 | 0.0 | 0.8000 | 0.3881 | -39.2651 | -26.1893 | -3.5315 | -3.4441 |
| 0.2073 | 3.1285 | 560 | 0.8458 | 0.1726 | 0.0 | 0.625 | 0.1726 | -42.8931 | -28.3435 | -3.5456 | -3.4402 |
| 0.1496 | 3.2402 | 580 | 0.7679 | 0.2554 | 0.0 | 0.6750 | 0.2554 | -41.5129 | -27.5165 | -3.4945 | -3.4113 |
| 0.1503 | 3.3520 | 600 | 0.7629 | 0.2099 | 0.0 | 0.6000 | 0.2099 | -42.3083 | -27.9710 | -3.5254 | -3.4360 |
| 0.1578 | 3.4637 | 620 | 0.7733 | 0.1533 | 0.0 | 0.6000 | 0.1533 | -41.1396 | -28.5373 | -3.5135 | -3.4252 |
| 0.1335 | 3.5754 | 640 | 0.8319 | 0.1997 | 0.0 | 0.6000 | 0.1997 | -41.4449 | -28.0729 | -3.5372 | -3.4387 |
| 0.1696 | 3.6872 | 660 | 0.8017 | 0.2201 | 0.0 | 0.6500 | 0.2201 | -42.0740 | -27.8688 | -3.5692 | -3.4636 |
| 0.2641 | 3.7989 | 680 | 0.8066 | 0.2394 | 0.0 | 0.7250 | 0.2394 | -42.6891 | -27.6759 | -3.5624 | -3.4601 |
| 0.1268 | 3.9106 | 700 | 0.7793 | 0.3316 | 0.0 | 0.75 | 0.3316 | -42.6197 | -26.7540 | -3.5242 | -3.4228 |
| 0.1236 | 4.0223 | 720 | 0.7696 | 0.3849 | 0.0 | 0.8000 | 0.3849 | -42.0404 | -26.2206 | -3.5181 | -3.4178 |
| 0.1061 | 4.1341 | 740 | 0.9666 | 0.1498 | 0.0 | 0.6750 | 0.1498 | -46.4133 | -28.5724 | -3.5567 | -3.4378 |
| 0.1186 | 4.2458 | 760 | 0.9323 | 0.1752 | 0.0 | 0.6000 | 0.1752 | -44.5664 | -28.3183 | -3.6046 | -3.4827 |
| 0.1112 | 4.3575 | 780 | 0.9042 | 0.1862 | 0.0 | 0.7000 | 0.1862 | -44.8347 | -28.2085 | -3.5715 | -3.4571 |
| 0.1463 | 4.4693 | 800 | 0.8225 | 0.2410 | 0.0 | 0.6750 | 0.2410 | -42.9694 | -27.6601 | -3.5655 | -3.4553 |
| 0.1564 | 4.5810 | 820 | 0.8811 | 0.1677 | 0.0 | 0.625 | 0.1677 | -44.3126 | -28.3932 | -3.5381 | -3.4259 |
| 0.1985 | 4.6927 | 840 | 0.9132 | 0.1664 | 0.0 | 0.6000 | 0.1664 | -45.6402 | -28.4064 | -3.5504 | -3.4339 |
| 0.1374 | 4.8045 | 860 | 0.8452 | 0.1916 | 0.0 | 0.6000 | 0.1916 | -44.4828 | -28.1538 | -3.5435 | -3.4322 |
| 0.1626 | 4.9162 | 880 | 0.8745 | 0.1316 | 0.0 | 0.6000 | 0.1316 | -44.5274 | -28.7537 | -3.5277 | -3.4138 |
| 0.1003 | 5.0279 | 900 | 0.9217 | 0.0483 | 0.0 | 0.5 | 0.0483 | -46.2505 | -29.5872 | -3.5361 | -3.4139 |
| 0.0927 | 5.1397 | 920 | 1.0600 | -0.0258 | 0.0 | 0.4750 | -0.0258 | -48.8408 | -30.3276 | -3.5497 | -3.4235 |
| 0.1022 | 5.2514 | 940 | 0.9659 | 0.0427 | 0.0 | 0.5500 | 0.0427 | -47.0927 | -29.6433 | -3.5460 | -3.4240 |
| 0.109 | 5.3631 | 960 | 1.0517 | -0.1151 | 0.0 | 0.5 | -0.1151 | -49.9369 | -31.2207 | -3.5436 | -3.4193 |
| 0.1338 | 5.4749 | 980 | 1.0318 | -0.0630 | 0.0 | 0.5250 | -0.0630 | -49.2989 | -30.6998 | -3.5513 | -3.4235 |
| 0.1032 | 5.5866 | 1000 | 1.0205 | -0.0941 | 0.0 | 0.5500 | -0.0941 | -48.3879 | -31.0113 | -3.5495 | -3.4234 |
| 0.0994 | 5.6983 | 1020 | 1.0377 | -0.0663 | 0.0 | 0.5500 | -0.0663 | -49.2657 | -30.7334 | -3.5589 | -3.4317 |
| 0.1406 | 5.8101 | 1040 | 1.0168 | -0.0241 | 0.0 | 0.5500 | -0.0241 | -49.0492 | -30.3108 | -3.5694 | -3.4398 |
| 0.1197 | 5.9218 | 1060 | 0.9964 | -0.0000 | 0.0 | 0.5750 | -0.0000 | -48.1696 | -30.0701 | -3.5505 | -3.4231 |
| 0.0783 | 6.0335 | 1080 | 1.0153 | -0.0575 | 0.0 | 0.5250 | -0.0575 | -48.8631 | -30.6448 | -3.5704 | -3.4415 |
| 0.0717 | 6.1453 | 1100 | 1.1374 | -0.1786 | 0.0 | 0.4750 | -0.1786 | -51.5463 | -31.8565 | -3.5625 | -3.4296 |
| 0.101 | 6.2570 | 1120 | 1.1705 | -0.2142 | 0.0 | 0.4500 | -0.2142 | -52.2915 | -32.2120 | -3.5480 | -3.4154 |
| 0.117 | 6.3687 | 1140 | 1.1203 | -0.1841 | 0.0 | 0.4250 | -0.1841 | -51.6086 | -31.9114 | -3.5513 | -3.4207 |
| 0.095 | 6.4804 | 1160 | 1.1487 | -0.2081 | 0.0 | 0.4000 | -0.2081 | -51.3992 | -32.1507 | -3.5420 | -3.4088 |
| 0.0921 | 6.5922 | 1180 | 1.1640 | -0.2049 | 0.0 | 0.4750 | -0.2049 | -51.9258 | -32.1190 | -3.5373 | -3.4033 |
| 0.0818 | 6.7039 | 1200 | 1.1760 | -0.1961 | 0.0 | 0.5 | -0.1961 | -52.2654 | -32.0309 | -3.5403 | -3.4070 |
| 0.0854 | 6.8156 | 1220 | 1.1823 | -0.2121 | 0.0 | 0.4500 | -0.2121 | -52.4322 | -32.1909 | -3.5436 | -3.4127 |
| 0.1399 | 6.9274 | 1240 | 1.1804 | -0.2021 | 0.0 | 0.4500 | -0.2021 | -52.6683 | -32.0915 | -3.5352 | -3.4029 |
| 0.0886 | 7.0391 | 1260 | 1.1777 | -0.1845 | 0.0 | 0.5 | -0.1845 | -52.7352 | -31.9150 | -3.5362 | -3.4040 |
| 0.1105 | 7.1508 | 1280 | 1.2006 | -0.2368 | 0.0 | 0.4500 | -0.2368 | -52.9880 | -32.4379 | -3.5347 | -3.4014 |
| 0.0773 | 7.2626 | 1300 | 1.2167 | -0.2458 | 0.0 | 0.4500 | -0.2458 | -53.3666 | -32.5277 | -3.5404 | -3.4086 |
| 0.0836 | 7.3743 | 1320 | 1.2340 | -0.2461 | 0.0 | 0.4500 | -0.2461 | -53.7346 | -32.5307 | -3.5380 | -3.4057 |
| 0.1214 | 7.4860 | 1340 | 1.2435 | -0.2655 | 0.0 | 0.4500 | -0.2655 | -54.0182 | -32.7252 | -3.5406 | -3.4093 |
| 0.115 | 7.5978 | 1360 | 1.2474 | -0.2650 | 0.0 | 0.4250 | -0.2650 | -54.0682 | -32.7199 | -3.5435 | -3.4125 |
| 0.0801 | 7.7095 | 1380 | 1.2451 | -0.2708 | 0.0 | 0.4500 | -0.2708 | -54.1074 | -32.7779 | -3.5365 | -3.4037 |
| 0.1084 | 7.8212 | 1400 | 1.2457 | -0.2575 | 0.0 | 0.4500 | -0.2575 | -53.9336 | -32.6446 | -3.5311 | -3.3971 |
| 0.1042 | 7.9330 | 1420 | 1.2461 | -0.2697 | 0.0 | 0.4500 | -0.2697 | -54.1756 | -32.7669 | -3.5368 | -3.4044 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu121
- Datasets 3.5.0
- Tokenizers 0.20.3
- Downloads last month
- 6