Model save

Files changed (7) hide show

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yuchenl4/lmpref/runs/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr5e-07_4try1WznBmWzfFFMvjYnNYdx82aFI0KpETh47ywHpUO7q6hc1CZ)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yuchenl4/lmpref/runs/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr5e-07_4try1L6MZEeiA6MxB7QkBQtrKHAKiK8PfxXWvwuITdLl3LeoF5M)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

all_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 1.0,
     "total_flos": 0.0,
-    "train_loss": 0.4142301192959094,
-    "train_runtime": 33606.4963,
     "train_samples": 45608,
-    "train_samples_per_second": 1.357,
-    "train_steps_per_second": 0.042
 }

 {
     "epoch": 1.0,
     "total_flos": 0.0,
+    "train_loss": 0.40314109663140724,
+    "train_runtime": 31848.6017,
     "train_samples": 45608,
+    "train_samples_per_second": 1.432,
+    "train_steps_per_second": 0.045
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2c3d7fb02d59e309bdb31b9be2a9d4092b4af2d72a4849447ff4c7fd4d7ced6
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:ba2a6cbcf44e327ce2c8a7da6dabda811e321245221f7921e0a2e05da8a65bef
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2842b911fa0e974aaea05470406ac83109676fb9b3339bddb785a65ad6522e7
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:c1f82238fa7508c72426fafb4ed33bfee05b41981bb36d097de572c4308c776d
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0aac081d8f0709c9d9464c741a5620e9470d908496e6c1153a3e2e16935048dd
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:adefb73fddbfd27a57f18a0d9305ee4c7fe00289bf38c16548828510369dd37d
 size 4540516344

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 1.0,
     "total_flos": 0.0,
-    "train_loss": 0.4142301192959094,
-    "train_runtime": 33606.4963,
     "train_samples": 45608,
-    "train_samples_per_second": 1.357,
-    "train_steps_per_second": 0.042
 }

 {
     "epoch": 1.0,
     "total_flos": 0.0,
+    "train_loss": 0.40314109663140724,
+    "train_runtime": 31848.6017,
     "train_samples": 45608,
+    "train_samples_per_second": 1.432,
+    "train_steps_per_second": 0.045
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff