YuchenLi01 commited on
Commit
491594b
·
verified ·
1 Parent(s): c75db9b

Model save

Browse files
README.md CHANGED
@@ -27,7 +27,7 @@ print(output["generated_text"])
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yuchenl4/lmpref/runs/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr5e-07_4try1WznBmWzfFFMvjYnNYdx82aFI0KpETh47ywHpUO7q6hc1CZ)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
 
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yuchenl4/lmpref/runs/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr5e-07_4try1L6MZEeiA6MxB7QkBQtrKHAKiK8PfxXWvwuITdLl3LeoF5M)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
all_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 1.0,
3
  "total_flos": 0.0,
4
- "train_loss": 0.4142301192959094,
5
- "train_runtime": 33606.4963,
6
  "train_samples": 45608,
7
- "train_samples_per_second": 1.357,
8
- "train_steps_per_second": 0.042
9
  }
 
1
  {
2
  "epoch": 1.0,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.40314109663140724,
5
+ "train_runtime": 31848.6017,
6
  "train_samples": 45608,
7
+ "train_samples_per_second": 1.432,
8
+ "train_steps_per_second": 0.045
9
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d2c3d7fb02d59e309bdb31b9be2a9d4092b4af2d72a4849447ff4c7fd4d7ced6
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba2a6cbcf44e327ce2c8a7da6dabda811e321245221f7921e0a2e05da8a65bef
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d2842b911fa0e974aaea05470406ac83109676fb9b3339bddb785a65ad6522e7
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1f82238fa7508c72426fafb4ed33bfee05b41981bb36d097de572c4308c776d
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0aac081d8f0709c9d9464c741a5620e9470d908496e6c1153a3e2e16935048dd
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adefb73fddbfd27a57f18a0d9305ee4c7fe00289bf38c16548828510369dd37d
3
  size 4540516344
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 1.0,
3
  "total_flos": 0.0,
4
- "train_loss": 0.4142301192959094,
5
- "train_runtime": 33606.4963,
6
  "train_samples": 45608,
7
- "train_samples_per_second": 1.357,
8
- "train_steps_per_second": 0.042
9
  }
 
1
  {
2
  "epoch": 1.0,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.40314109663140724,
5
+ "train_runtime": 31848.6017,
6
  "train_samples": 45608,
7
+ "train_samples_per_second": 1.432,
8
+ "train_steps_per_second": 0.045
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff