penfever commited on
Commit
0628306
·
verified ·
1 Parent(s): 8f8ddb4

End of training

Browse files
Files changed (5) hide show
  1. README.md +2 -1
  2. all_results.json +16 -0
  3. train_results.json +16 -0
  4. trainer_state.json +3490 -0
  5. training_loss.png +0 -0
README.md CHANGED
@@ -4,6 +4,7 @@ license: apache-2.0
4
  base_model: Qwen/Qwen3-8B
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: code-contests-sandboxes-traces-terminus-2_global-batch-size_32
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # code-contests-sandboxes-traces-terminus-2_global-batch-size_32
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on an unknown dataset.
19
 
20
  ## Model description
21
 
 
4
  base_model: Qwen/Qwen3-8B
5
  tags:
6
  - llama-factory
7
+ - full
8
  - generated_from_trainer
9
  model-index:
10
  - name: code-contests-sandboxes-traces-terminus-2_global-batch-size_32
 
16
 
17
  # code-contests-sandboxes-traces-terminus-2_global-batch-size_32
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the mlfoundations-dev/code-contests-sandboxes-traces-terminus-2 dataset.
20
 
21
  ## Model description
22
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "achieved_tflops_per_gpu": 2.6369613385336814,
3
+ "achieved_tflops_per_gpu_theoretical": 304.9114106540851,
4
+ "epoch": 5.0,
5
+ "loss_nan_ranks": 0,
6
+ "loss_rank_avg": 0.3492066562175751,
7
+ "mfu_percent": 0.18635769176916478,
8
+ "mfu_percent_theoretical": 21.54850958686114,
9
+ "total_flos": 7.336290139934556e+17,
10
+ "train_loss": 0.37341142744301986,
11
+ "train_runtime": 17388.125,
12
+ "train_samples_per_second": 2.876,
13
+ "train_steps_per_second": 0.09,
14
+ "valid_targets_mean": 3627.8,
15
+ "valid_targets_min": 954
16
+ }
train_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "achieved_tflops_per_gpu": 2.6369613385336814,
3
+ "achieved_tflops_per_gpu_theoretical": 304.9114106540851,
4
+ "epoch": 5.0,
5
+ "loss_nan_ranks": 0,
6
+ "loss_rank_avg": 0.3492066562175751,
7
+ "mfu_percent": 0.18635769176916478,
8
+ "mfu_percent_theoretical": 21.54850958686114,
9
+ "total_flos": 7.336290139934556e+17,
10
+ "train_loss": 0.37341142744301986,
11
+ "train_runtime": 17388.125,
12
+ "train_samples_per_second": 2.876,
13
+ "train_steps_per_second": 0.09,
14
+ "valid_targets_mean": 3627.8,
15
+ "valid_targets_min": 954
16
+ }
trainer_state.json ADDED
@@ -0,0 +1,3490 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 5.0,
6
+ "eval_steps": 500,
7
+ "global_step": 1565,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.016,
14
+ "grad_norm": 5.69872959186611,
15
+ "learning_rate": 1.0191082802547772e-06,
16
+ "loss": 0.6974,
17
+ "loss_nan_ranks": 0,
18
+ "loss_rank_avg": 0.3614201545715332,
19
+ "step": 5,
20
+ "valid_targets_mean": 3866.1,
21
+ "valid_targets_min": 867
22
+ },
23
+ {
24
+ "epoch": 0.032,
25
+ "grad_norm": 5.164438631475304,
26
+ "learning_rate": 2.2929936305732485e-06,
27
+ "loss": 0.6778,
28
+ "loss_nan_ranks": 0,
29
+ "loss_rank_avg": 0.3798177242279053,
30
+ "step": 10,
31
+ "valid_targets_mean": 3772.9,
32
+ "valid_targets_min": 1172
33
+ },
34
+ {
35
+ "epoch": 0.048,
36
+ "grad_norm": 3.1544536534246252,
37
+ "learning_rate": 3.56687898089172e-06,
38
+ "loss": 0.6826,
39
+ "loss_nan_ranks": 0,
40
+ "loss_rank_avg": 0.33693352341651917,
41
+ "step": 15,
42
+ "valid_targets_mean": 5926.4,
43
+ "valid_targets_min": 955
44
+ },
45
+ {
46
+ "epoch": 0.064,
47
+ "grad_norm": 2.0390987903237137,
48
+ "learning_rate": 4.840764331210192e-06,
49
+ "loss": 0.6153,
50
+ "loss_nan_ranks": 0,
51
+ "loss_rank_avg": 0.3763325810432434,
52
+ "step": 20,
53
+ "valid_targets_mean": 5537.4,
54
+ "valid_targets_min": 684
55
+ },
56
+ {
57
+ "epoch": 0.08,
58
+ "grad_norm": 1.2724968440026243,
59
+ "learning_rate": 6.114649681528663e-06,
60
+ "loss": 0.5933,
61
+ "loss_nan_ranks": 0,
62
+ "loss_rank_avg": 0.2624075710773468,
63
+ "step": 25,
64
+ "valid_targets_mean": 3464.9,
65
+ "valid_targets_min": 557
66
+ },
67
+ {
68
+ "epoch": 0.096,
69
+ "grad_norm": 0.8015927473817406,
70
+ "learning_rate": 7.388535031847134e-06,
71
+ "loss": 0.5901,
72
+ "loss_nan_ranks": 0,
73
+ "loss_rank_avg": 0.2664138078689575,
74
+ "step": 30,
75
+ "valid_targets_mean": 4067.4,
76
+ "valid_targets_min": 422
77
+ },
78
+ {
79
+ "epoch": 0.112,
80
+ "grad_norm": 0.7590646795996507,
81
+ "learning_rate": 8.662420382165606e-06,
82
+ "loss": 0.5455,
83
+ "loss_nan_ranks": 0,
84
+ "loss_rank_avg": 0.27429863810539246,
85
+ "step": 35,
86
+ "valid_targets_mean": 3136.5,
87
+ "valid_targets_min": 596
88
+ },
89
+ {
90
+ "epoch": 0.128,
91
+ "grad_norm": 0.5768938436627202,
92
+ "learning_rate": 9.936305732484078e-06,
93
+ "loss": 0.5396,
94
+ "loss_nan_ranks": 0,
95
+ "loss_rank_avg": 0.25026804208755493,
96
+ "step": 40,
97
+ "valid_targets_mean": 3707.1,
98
+ "valid_targets_min": 905
99
+ },
100
+ {
101
+ "epoch": 0.144,
102
+ "grad_norm": 0.545771347093659,
103
+ "learning_rate": 1.1210191082802548e-05,
104
+ "loss": 0.5283,
105
+ "loss_nan_ranks": 0,
106
+ "loss_rank_avg": 0.2582656741142273,
107
+ "step": 45,
108
+ "valid_targets_mean": 3181.2,
109
+ "valid_targets_min": 1120
110
+ },
111
+ {
112
+ "epoch": 0.16,
113
+ "grad_norm": 0.5029841857856351,
114
+ "learning_rate": 1.248407643312102e-05,
115
+ "loss": 0.5368,
116
+ "loss_nan_ranks": 0,
117
+ "loss_rank_avg": 0.35082149505615234,
118
+ "step": 50,
119
+ "valid_targets_mean": 5725.4,
120
+ "valid_targets_min": 665
121
+ },
122
+ {
123
+ "epoch": 0.176,
124
+ "grad_norm": 0.4722335098176793,
125
+ "learning_rate": 1.375796178343949e-05,
126
+ "loss": 0.5508,
127
+ "loss_nan_ranks": 0,
128
+ "loss_rank_avg": 0.3100292384624481,
129
+ "step": 55,
130
+ "valid_targets_mean": 4060.9,
131
+ "valid_targets_min": 671
132
+ },
133
+ {
134
+ "epoch": 0.192,
135
+ "grad_norm": 0.4393102585880352,
136
+ "learning_rate": 1.5031847133757964e-05,
137
+ "loss": 0.4724,
138
+ "loss_nan_ranks": 0,
139
+ "loss_rank_avg": 0.2322624772787094,
140
+ "step": 60,
141
+ "valid_targets_mean": 3568.0,
142
+ "valid_targets_min": 727
143
+ },
144
+ {
145
+ "epoch": 0.208,
146
+ "grad_norm": 0.46984637985358174,
147
+ "learning_rate": 1.6305732484076436e-05,
148
+ "loss": 0.4841,
149
+ "loss_nan_ranks": 0,
150
+ "loss_rank_avg": 0.22150176763534546,
151
+ "step": 65,
152
+ "valid_targets_mean": 2860.4,
153
+ "valid_targets_min": 1026
154
+ },
155
+ {
156
+ "epoch": 0.224,
157
+ "grad_norm": 0.38610287129563536,
158
+ "learning_rate": 1.7579617834394907e-05,
159
+ "loss": 0.4838,
160
+ "loss_nan_ranks": 0,
161
+ "loss_rank_avg": 0.18804149329662323,
162
+ "step": 70,
163
+ "valid_targets_mean": 3148.8,
164
+ "valid_targets_min": 1224
165
+ },
166
+ {
167
+ "epoch": 0.24,
168
+ "grad_norm": 0.5071493123995814,
169
+ "learning_rate": 1.8853503184713376e-05,
170
+ "loss": 0.4685,
171
+ "loss_nan_ranks": 0,
172
+ "loss_rank_avg": 0.22837616503238678,
173
+ "step": 75,
174
+ "valid_targets_mean": 2380.9,
175
+ "valid_targets_min": 760
176
+ },
177
+ {
178
+ "epoch": 0.256,
179
+ "grad_norm": 0.39991978336257084,
180
+ "learning_rate": 2.0127388535031848e-05,
181
+ "loss": 0.473,
182
+ "loss_nan_ranks": 0,
183
+ "loss_rank_avg": 0.23189273476600647,
184
+ "step": 80,
185
+ "valid_targets_mean": 3358.9,
186
+ "valid_targets_min": 811
187
+ },
188
+ {
189
+ "epoch": 0.272,
190
+ "grad_norm": 0.3750148456903095,
191
+ "learning_rate": 2.140127388535032e-05,
192
+ "loss": 0.4527,
193
+ "loss_nan_ranks": 0,
194
+ "loss_rank_avg": 0.22813287377357483,
195
+ "step": 85,
196
+ "valid_targets_mean": 5757.9,
197
+ "valid_targets_min": 890
198
+ },
199
+ {
200
+ "epoch": 0.288,
201
+ "grad_norm": 0.403265544100627,
202
+ "learning_rate": 2.267515923566879e-05,
203
+ "loss": 0.4428,
204
+ "loss_nan_ranks": 0,
205
+ "loss_rank_avg": 0.18977582454681396,
206
+ "step": 90,
207
+ "valid_targets_mean": 3059.4,
208
+ "valid_targets_min": 790
209
+ },
210
+ {
211
+ "epoch": 0.304,
212
+ "grad_norm": 0.4536180824055258,
213
+ "learning_rate": 2.3949044585987263e-05,
214
+ "loss": 0.438,
215
+ "loss_nan_ranks": 0,
216
+ "loss_rank_avg": 0.23432332277297974,
217
+ "step": 95,
218
+ "valid_targets_mean": 3082.3,
219
+ "valid_targets_min": 803
220
+ },
221
+ {
222
+ "epoch": 0.32,
223
+ "grad_norm": 0.3512124065455802,
224
+ "learning_rate": 2.5222929936305732e-05,
225
+ "loss": 0.4555,
226
+ "loss_nan_ranks": 0,
227
+ "loss_rank_avg": 0.21950744092464447,
228
+ "step": 100,
229
+ "valid_targets_mean": 4653.7,
230
+ "valid_targets_min": 1002
231
+ },
232
+ {
233
+ "epoch": 0.336,
234
+ "grad_norm": 0.3089899886308942,
235
+ "learning_rate": 2.6496815286624204e-05,
236
+ "loss": 0.4491,
237
+ "loss_nan_ranks": 0,
238
+ "loss_rank_avg": 0.19772186875343323,
239
+ "step": 105,
240
+ "valid_targets_mean": 4689.6,
241
+ "valid_targets_min": 701
242
+ },
243
+ {
244
+ "epoch": 0.352,
245
+ "grad_norm": 0.34589835913470146,
246
+ "learning_rate": 2.7770700636942676e-05,
247
+ "loss": 0.4341,
248
+ "loss_nan_ranks": 0,
249
+ "loss_rank_avg": 0.22734415531158447,
250
+ "step": 110,
251
+ "valid_targets_mean": 5929.0,
252
+ "valid_targets_min": 737
253
+ },
254
+ {
255
+ "epoch": 0.368,
256
+ "grad_norm": 0.333900665380794,
257
+ "learning_rate": 2.9044585987261148e-05,
258
+ "loss": 0.4147,
259
+ "loss_nan_ranks": 0,
260
+ "loss_rank_avg": 0.15204599499702454,
261
+ "step": 115,
262
+ "valid_targets_mean": 4069.6,
263
+ "valid_targets_min": 807
264
+ },
265
+ {
266
+ "epoch": 0.384,
267
+ "grad_norm": 0.4286792724989212,
268
+ "learning_rate": 3.0318471337579623e-05,
269
+ "loss": 0.4335,
270
+ "loss_nan_ranks": 0,
271
+ "loss_rank_avg": 0.2184617519378662,
272
+ "step": 120,
273
+ "valid_targets_mean": 2641.9,
274
+ "valid_targets_min": 651
275
+ },
276
+ {
277
+ "epoch": 0.4,
278
+ "grad_norm": 0.4994416444087381,
279
+ "learning_rate": 3.1592356687898095e-05,
280
+ "loss": 0.4316,
281
+ "loss_nan_ranks": 0,
282
+ "loss_rank_avg": 0.13226771354675293,
283
+ "step": 125,
284
+ "valid_targets_mean": 2113.8,
285
+ "valid_targets_min": 619
286
+ },
287
+ {
288
+ "epoch": 0.416,
289
+ "grad_norm": 0.4533713886076989,
290
+ "learning_rate": 3.286624203821656e-05,
291
+ "loss": 0.4146,
292
+ "loss_nan_ranks": 0,
293
+ "loss_rank_avg": 0.18592208623886108,
294
+ "step": 130,
295
+ "valid_targets_mean": 2183.4,
296
+ "valid_targets_min": 624
297
+ },
298
+ {
299
+ "epoch": 0.432,
300
+ "grad_norm": 0.4063059240583294,
301
+ "learning_rate": 3.414012738853504e-05,
302
+ "loss": 0.4358,
303
+ "loss_nan_ranks": 0,
304
+ "loss_rank_avg": 0.25601351261138916,
305
+ "step": 135,
306
+ "valid_targets_mean": 4597.6,
307
+ "valid_targets_min": 1140
308
+ },
309
+ {
310
+ "epoch": 0.448,
311
+ "grad_norm": 0.36028533700369003,
312
+ "learning_rate": 3.541401273885351e-05,
313
+ "loss": 0.4056,
314
+ "loss_nan_ranks": 0,
315
+ "loss_rank_avg": 0.21660026907920837,
316
+ "step": 140,
317
+ "valid_targets_mean": 4851.8,
318
+ "valid_targets_min": 920
319
+ },
320
+ {
321
+ "epoch": 0.464,
322
+ "grad_norm": 0.37494642923573474,
323
+ "learning_rate": 3.6687898089171976e-05,
324
+ "loss": 0.4364,
325
+ "loss_nan_ranks": 0,
326
+ "loss_rank_avg": 0.2517198324203491,
327
+ "step": 145,
328
+ "valid_targets_mean": 4430.9,
329
+ "valid_targets_min": 1251
330
+ },
331
+ {
332
+ "epoch": 0.48,
333
+ "grad_norm": 0.416006623617401,
334
+ "learning_rate": 3.796178343949045e-05,
335
+ "loss": 0.4304,
336
+ "loss_nan_ranks": 0,
337
+ "loss_rank_avg": 0.257464200258255,
338
+ "step": 150,
339
+ "valid_targets_mean": 5269.2,
340
+ "valid_targets_min": 699
341
+ },
342
+ {
343
+ "epoch": 0.496,
344
+ "grad_norm": 0.4201384306903681,
345
+ "learning_rate": 3.923566878980892e-05,
346
+ "loss": 0.419,
347
+ "loss_nan_ranks": 0,
348
+ "loss_rank_avg": 0.26475703716278076,
349
+ "step": 155,
350
+ "valid_targets_mean": 4737.1,
351
+ "valid_targets_min": 702
352
+ },
353
+ {
354
+ "epoch": 0.512,
355
+ "grad_norm": 0.37008181636461424,
356
+ "learning_rate": 3.999980086219931e-05,
357
+ "loss": 0.4124,
358
+ "loss_nan_ranks": 0,
359
+ "loss_rank_avg": 0.16652914881706238,
360
+ "step": 160,
361
+ "valid_targets_mean": 2959.7,
362
+ "valid_targets_min": 586
363
+ },
364
+ {
365
+ "epoch": 0.528,
366
+ "grad_norm": 0.4076042174730073,
367
+ "learning_rate": 3.9997560607483595e-05,
368
+ "loss": 0.4087,
369
+ "loss_nan_ranks": 0,
370
+ "loss_rank_avg": 0.16840457916259766,
371
+ "step": 165,
372
+ "valid_targets_mean": 3294.4,
373
+ "valid_targets_min": 732
374
+ },
375
+ {
376
+ "epoch": 0.544,
377
+ "grad_norm": 0.32841247716849165,
378
+ "learning_rate": 3.999283145555291e-05,
379
+ "loss": 0.4026,
380
+ "loss_nan_ranks": 0,
381
+ "loss_rank_avg": 0.19586911797523499,
382
+ "step": 170,
383
+ "valid_targets_mean": 5450.7,
384
+ "valid_targets_min": 693
385
+ },
386
+ {
387
+ "epoch": 0.56,
388
+ "grad_norm": 0.41779856510643243,
389
+ "learning_rate": 3.998561399499772e-05,
390
+ "loss": 0.3959,
391
+ "loss_nan_ranks": 0,
392
+ "loss_rank_avg": 0.174172043800354,
393
+ "step": 175,
394
+ "valid_targets_mean": 4048.8,
395
+ "valid_targets_min": 916
396
+ },
397
+ {
398
+ "epoch": 0.576,
399
+ "grad_norm": 0.4741559609057003,
400
+ "learning_rate": 3.997590912410345e-05,
401
+ "loss": 0.406,
402
+ "loss_nan_ranks": 0,
403
+ "loss_rank_avg": 0.17835122346878052,
404
+ "step": 180,
405
+ "valid_targets_mean": 2691.6,
406
+ "valid_targets_min": 878
407
+ },
408
+ {
409
+ "epoch": 0.592,
410
+ "grad_norm": 0.4233303846560544,
411
+ "learning_rate": 3.996371805073874e-05,
412
+ "loss": 0.4124,
413
+ "loss_nan_ranks": 0,
414
+ "loss_rank_avg": 0.19907709956169128,
415
+ "step": 185,
416
+ "valid_targets_mean": 3547.6,
417
+ "valid_targets_min": 582
418
+ },
419
+ {
420
+ "epoch": 0.608,
421
+ "grad_norm": 0.5519863740267512,
422
+ "learning_rate": 3.994904229220507e-05,
423
+ "loss": 0.4131,
424
+ "loss_nan_ranks": 0,
425
+ "loss_rank_avg": 0.2815743386745453,
426
+ "step": 190,
427
+ "valid_targets_mean": 5913.5,
428
+ "valid_targets_min": 789
429
+ },
430
+ {
431
+ "epoch": 0.624,
432
+ "grad_norm": 0.3853735917618442,
433
+ "learning_rate": 3.9931883675047966e-05,
434
+ "loss": 0.4107,
435
+ "loss_nan_ranks": 0,
436
+ "loss_rank_avg": 0.2598858177661896,
437
+ "step": 195,
438
+ "valid_targets_mean": 5340.4,
439
+ "valid_targets_min": 1088
440
+ },
441
+ {
442
+ "epoch": 0.64,
443
+ "grad_norm": 0.38155567667116275,
444
+ "learning_rate": 3.991224433482961e-05,
445
+ "loss": 0.3849,
446
+ "loss_nan_ranks": 0,
447
+ "loss_rank_avg": 0.16318747401237488,
448
+ "step": 200,
449
+ "valid_targets_mean": 3579.7,
450
+ "valid_targets_min": 598
451
+ },
452
+ {
453
+ "epoch": 0.656,
454
+ "grad_norm": 0.3511821690187007,
455
+ "learning_rate": 3.98901267158631e-05,
456
+ "loss": 0.3904,
457
+ "loss_nan_ranks": 0,
458
+ "loss_rank_avg": 0.13865548372268677,
459
+ "step": 205,
460
+ "valid_targets_mean": 3389.2,
461
+ "valid_targets_min": 740
462
+ },
463
+ {
464
+ "epoch": 0.672,
465
+ "grad_norm": 0.35229572545058246,
466
+ "learning_rate": 3.98655335709082e-05,
467
+ "loss": 0.4191,
468
+ "loss_nan_ranks": 0,
469
+ "loss_rank_avg": 0.13957738876342773,
470
+ "step": 210,
471
+ "valid_targets_mean": 2704.1,
472
+ "valid_targets_min": 784
473
+ },
474
+ {
475
+ "epoch": 0.688,
476
+ "grad_norm": 0.35386211696296843,
477
+ "learning_rate": 3.9838467960828745e-05,
478
+ "loss": 0.4035,
479
+ "loss_nan_ranks": 0,
480
+ "loss_rank_avg": 0.20394352078437805,
481
+ "step": 215,
482
+ "valid_targets_mean": 4380.9,
483
+ "valid_targets_min": 546
484
+ },
485
+ {
486
+ "epoch": 0.704,
487
+ "grad_norm": 0.3704950358052972,
488
+ "learning_rate": 3.9808933254211665e-05,
489
+ "loss": 0.4189,
490
+ "loss_nan_ranks": 0,
491
+ "loss_rank_avg": 0.21493899822235107,
492
+ "step": 220,
493
+ "valid_targets_mean": 4052.7,
494
+ "valid_targets_min": 692
495
+ },
496
+ {
497
+ "epoch": 0.72,
498
+ "grad_norm": 0.4196268184599092,
499
+ "learning_rate": 3.977693312694778e-05,
500
+ "loss": 0.3944,
501
+ "loss_nan_ranks": 0,
502
+ "loss_rank_avg": 0.188992440700531,
503
+ "step": 225,
504
+ "valid_targets_mean": 3296.3,
505
+ "valid_targets_min": 752
506
+ },
507
+ {
508
+ "epoch": 0.736,
509
+ "grad_norm": 0.3908305933846935,
510
+ "learning_rate": 3.974247156177423e-05,
511
+ "loss": 0.4177,
512
+ "loss_nan_ranks": 0,
513
+ "loss_rank_avg": 0.16833311319351196,
514
+ "step": 230,
515
+ "valid_targets_mean": 3082.8,
516
+ "valid_targets_min": 623
517
+ },
518
+ {
519
+ "epoch": 0.752,
520
+ "grad_norm": 0.4976376129445208,
521
+ "learning_rate": 3.970555284777883e-05,
522
+ "loss": 0.4163,
523
+ "loss_nan_ranks": 0,
524
+ "loss_rank_avg": 0.21885736286640167,
525
+ "step": 235,
526
+ "valid_targets_mean": 3845.2,
527
+ "valid_targets_min": 787
528
+ },
529
+ {
530
+ "epoch": 0.768,
531
+ "grad_norm": 0.4175782511172779,
532
+ "learning_rate": 3.9666181579866244e-05,
533
+ "loss": 0.4175,
534
+ "loss_nan_ranks": 0,
535
+ "loss_rank_avg": 0.25375479459762573,
536
+ "step": 240,
537
+ "valid_targets_mean": 4226.0,
538
+ "valid_targets_min": 701
539
+ },
540
+ {
541
+ "epoch": 0.784,
542
+ "grad_norm": 0.38370492586706956,
543
+ "learning_rate": 3.962436265818611e-05,
544
+ "loss": 0.4195,
545
+ "loss_nan_ranks": 0,
546
+ "loss_rank_avg": 0.22604212164878845,
547
+ "step": 245,
548
+ "valid_targets_mean": 4141.8,
549
+ "valid_targets_min": 832
550
+ },
551
+ {
552
+ "epoch": 0.8,
553
+ "grad_norm": 0.3352818127708385,
554
+ "learning_rate": 3.9580101287523105e-05,
555
+ "loss": 0.4106,
556
+ "loss_nan_ranks": 0,
557
+ "loss_rank_avg": 0.13106878101825714,
558
+ "step": 250,
559
+ "valid_targets_mean": 2769.6,
560
+ "valid_targets_min": 632
561
+ },
562
+ {
563
+ "epoch": 0.816,
564
+ "grad_norm": 0.35918802953549456,
565
+ "learning_rate": 3.953340297664928e-05,
566
+ "loss": 0.4023,
567
+ "loss_nan_ranks": 0,
568
+ "loss_rank_avg": 0.20013932883739471,
569
+ "step": 255,
570
+ "valid_targets_mean": 5036.6,
571
+ "valid_targets_min": 845
572
+ },
573
+ {
574
+ "epoch": 0.832,
575
+ "grad_norm": 0.46404548019076985,
576
+ "learning_rate": 3.948427353763829e-05,
577
+ "loss": 0.4259,
578
+ "loss_nan_ranks": 0,
579
+ "loss_rank_avg": 0.2261233776807785,
580
+ "step": 260,
581
+ "valid_targets_mean": 3119.8,
582
+ "valid_targets_min": 1146
583
+ },
584
+ {
585
+ "epoch": 0.848,
586
+ "grad_norm": 0.6034877544521583,
587
+ "learning_rate": 3.943271908514216e-05,
588
+ "loss": 0.3996,
589
+ "loss_nan_ranks": 0,
590
+ "loss_rank_avg": 0.20489788055419922,
591
+ "step": 265,
592
+ "valid_targets_mean": 3835.4,
593
+ "valid_targets_min": 585
594
+ },
595
+ {
596
+ "epoch": 0.864,
597
+ "grad_norm": 0.3776305065926428,
598
+ "learning_rate": 3.937874603563015e-05,
599
+ "loss": 0.3949,
600
+ "loss_nan_ranks": 0,
601
+ "loss_rank_avg": 0.17166143655776978,
602
+ "step": 270,
603
+ "valid_targets_mean": 4574.4,
604
+ "valid_targets_min": 574
605
+ },
606
+ {
607
+ "epoch": 0.88,
608
+ "grad_norm": 0.43899093850067783,
609
+ "learning_rate": 3.932236110659023e-05,
610
+ "loss": 0.4057,
611
+ "loss_nan_ranks": 0,
612
+ "loss_rank_avg": 0.1660895049571991,
613
+ "step": 275,
614
+ "valid_targets_mean": 2570.5,
615
+ "valid_targets_min": 852
616
+ },
617
+ {
618
+ "epoch": 0.896,
619
+ "grad_norm": 0.4836232981284088,
620
+ "learning_rate": 3.9263571315692976e-05,
621
+ "loss": 0.42,
622
+ "loss_nan_ranks": 0,
623
+ "loss_rank_avg": 0.1986091136932373,
624
+ "step": 280,
625
+ "valid_targets_mean": 4396.8,
626
+ "valid_targets_min": 779
627
+ },
628
+ {
629
+ "epoch": 0.912,
630
+ "grad_norm": 0.4429870800850433,
631
+ "learning_rate": 3.920238397991818e-05,
632
+ "loss": 0.4315,
633
+ "loss_nan_ranks": 0,
634
+ "loss_rank_avg": 0.188503235578537,
635
+ "step": 285,
636
+ "valid_targets_mean": 2363.9,
637
+ "valid_targets_min": 489
638
+ },
639
+ {
640
+ "epoch": 0.928,
641
+ "grad_norm": 0.4051710874433319,
642
+ "learning_rate": 3.913880671464418e-05,
643
+ "loss": 0.4256,
644
+ "loss_nan_ranks": 0,
645
+ "loss_rank_avg": 0.1489955335855484,
646
+ "step": 290,
647
+ "valid_targets_mean": 2600.9,
648
+ "valid_targets_min": 884
649
+ },
650
+ {
651
+ "epoch": 0.944,
652
+ "grad_norm": 0.3825741990919869,
653
+ "learning_rate": 3.907284743270001e-05,
654
+ "loss": 0.3981,
655
+ "loss_nan_ranks": 0,
656
+ "loss_rank_avg": 0.16852974891662598,
657
+ "step": 295,
658
+ "valid_targets_mean": 2941.3,
659
+ "valid_targets_min": 610
660
+ },
661
+ {
662
+ "epoch": 0.96,
663
+ "grad_norm": 0.35337156229499256,
664
+ "learning_rate": 3.900451434338062e-05,
665
+ "loss": 0.4173,
666
+ "loss_nan_ranks": 0,
667
+ "loss_rank_avg": 0.19359055161476135,
668
+ "step": 300,
669
+ "valid_targets_mean": 5110.8,
670
+ "valid_targets_min": 725
671
+ },
672
+ {
673
+ "epoch": 0.976,
674
+ "grad_norm": 0.3630355820027396,
675
+ "learning_rate": 3.893381595142511e-05,
676
+ "loss": 0.4033,
677
+ "loss_nan_ranks": 0,
678
+ "loss_rank_avg": 0.2465812861919403,
679
+ "step": 305,
680
+ "valid_targets_mean": 4703.4,
681
+ "valid_targets_min": 679
682
+ },
683
+ {
684
+ "epoch": 0.992,
685
+ "grad_norm": 0.332578670862645,
686
+ "learning_rate": 3.886076105595825e-05,
687
+ "loss": 0.3935,
688
+ "loss_nan_ranks": 0,
689
+ "loss_rank_avg": 0.16100963950157166,
690
+ "step": 310,
691
+ "valid_targets_mean": 4606.3,
692
+ "valid_targets_min": 771
693
+ },
694
+ {
695
+ "epoch": 1.0064,
696
+ "grad_norm": 0.3699048218897067,
697
+ "learning_rate": 3.878535874939532e-05,
698
+ "loss": 0.3777,
699
+ "loss_nan_ranks": 0,
700
+ "loss_rank_avg": 0.19179093837738037,
701
+ "step": 315,
702
+ "valid_targets_mean": 4326.3,
703
+ "valid_targets_min": 1151
704
+ },
705
+ {
706
+ "epoch": 1.0224,
707
+ "grad_norm": 0.35993800947449023,
708
+ "learning_rate": 3.870761841631051e-05,
709
+ "loss": 0.3994,
710
+ "loss_nan_ranks": 0,
711
+ "loss_rank_avg": 0.17964524030685425,
712
+ "step": 320,
713
+ "valid_targets_mean": 3957.6,
714
+ "valid_targets_min": 703
715
+ },
716
+ {
717
+ "epoch": 1.0384,
718
+ "grad_norm": 0.4176440488699317,
719
+ "learning_rate": 3.862754973226887e-05,
720
+ "loss": 0.3892,
721
+ "loss_nan_ranks": 0,
722
+ "loss_rank_avg": 0.15542760491371155,
723
+ "step": 325,
724
+ "valid_targets_mean": 2751.8,
725
+ "valid_targets_min": 379
726
+ },
727
+ {
728
+ "epoch": 1.0544,
729
+ "grad_norm": 0.40074497319972846,
730
+ "learning_rate": 3.85451626626221e-05,
731
+ "loss": 0.3831,
732
+ "loss_nan_ranks": 0,
733
+ "loss_rank_avg": 0.17138831317424774,
734
+ "step": 330,
735
+ "valid_targets_mean": 2887.1,
736
+ "valid_targets_min": 871
737
+ },
738
+ {
739
+ "epoch": 1.0704,
740
+ "grad_norm": 0.34919217958278553,
741
+ "learning_rate": 3.846046746126827e-05,
742
+ "loss": 0.3805,
743
+ "loss_nan_ranks": 0,
744
+ "loss_rank_avg": 0.1968536376953125,
745
+ "step": 335,
746
+ "valid_targets_mean": 5225.8,
747
+ "valid_targets_min": 766
748
+ },
749
+ {
750
+ "epoch": 1.0864,
751
+ "grad_norm": 0.3824861551768386,
752
+ "learning_rate": 3.837347466937562e-05,
753
+ "loss": 0.394,
754
+ "loss_nan_ranks": 0,
755
+ "loss_rank_avg": 0.1652269959449768,
756
+ "step": 340,
757
+ "valid_targets_mean": 3592.2,
758
+ "valid_targets_min": 371
759
+ },
760
+ {
761
+ "epoch": 1.1024,
762
+ "grad_norm": 0.48727554921232685,
763
+ "learning_rate": 3.828419511407062e-05,
764
+ "loss": 0.3926,
765
+ "loss_nan_ranks": 0,
766
+ "loss_rank_avg": 0.2775905728340149,
767
+ "step": 345,
768
+ "valid_targets_mean": 3552.9,
769
+ "valid_targets_min": 790
770
+ },
771
+ {
772
+ "epoch": 1.1184,
773
+ "grad_norm": 0.331694685748023,
774
+ "learning_rate": 3.819263990709037e-05,
775
+ "loss": 0.3926,
776
+ "loss_nan_ranks": 0,
777
+ "loss_rank_avg": 0.17502962052822113,
778
+ "step": 350,
779
+ "valid_targets_mean": 4362.4,
780
+ "valid_targets_min": 934
781
+ },
782
+ {
783
+ "epoch": 1.1344,
784
+ "grad_norm": 0.4491397219899234,
785
+ "learning_rate": 3.809882044339971e-05,
786
+ "loss": 0.4077,
787
+ "loss_nan_ranks": 0,
788
+ "loss_rank_avg": 0.19338899850845337,
789
+ "step": 355,
790
+ "valid_targets_mean": 3287.8,
791
+ "valid_targets_min": 630
792
+ },
793
+ {
794
+ "epoch": 1.1504,
795
+ "grad_norm": 0.34406293508467406,
796
+ "learning_rate": 3.800274839977293e-05,
797
+ "loss": 0.3706,
798
+ "loss_nan_ranks": 0,
799
+ "loss_rank_avg": 0.18019019067287445,
800
+ "step": 360,
801
+ "valid_targets_mean": 5227.4,
802
+ "valid_targets_min": 764
803
+ },
804
+ {
805
+ "epoch": 1.1663999999999999,
806
+ "grad_norm": 0.38705503675482716,
807
+ "learning_rate": 3.790443573334055e-05,
808
+ "loss": 0.3704,
809
+ "loss_nan_ranks": 0,
810
+ "loss_rank_avg": 0.18869172036647797,
811
+ "step": 365,
812
+ "valid_targets_mean": 4310.2,
813
+ "valid_targets_min": 760
814
+ },
815
+ {
816
+ "epoch": 1.1824,
817
+ "grad_norm": 0.42059597448080804,
818
+ "learning_rate": 3.780389468010106e-05,
819
+ "loss": 0.3843,
820
+ "loss_nan_ranks": 0,
821
+ "loss_rank_avg": 0.2906254827976227,
822
+ "step": 370,
823
+ "valid_targets_mean": 5065.8,
824
+ "valid_targets_min": 939
825
+ },
826
+ {
827
+ "epoch": 1.1984,
828
+ "grad_norm": 0.3896344271054351,
829
+ "learning_rate": 3.7701137753398075e-05,
830
+ "loss": 0.3806,
831
+ "loss_nan_ranks": 0,
832
+ "loss_rank_avg": 0.21188490092754364,
833
+ "step": 375,
834
+ "valid_targets_mean": 4391.9,
835
+ "valid_targets_min": 816
836
+ },
837
+ {
838
+ "epoch": 1.2144,
839
+ "grad_norm": 0.35064060810634223,
840
+ "learning_rate": 3.759617774236292e-05,
841
+ "loss": 0.3849,
842
+ "loss_nan_ranks": 0,
843
+ "loss_rank_avg": 0.21180033683776855,
844
+ "step": 380,
845
+ "valid_targets_mean": 5338.6,
846
+ "valid_targets_min": 805
847
+ },
848
+ {
849
+ "epoch": 1.2304,
850
+ "grad_norm": 0.3999618540862411,
851
+ "learning_rate": 3.748902771032288e-05,
852
+ "loss": 0.3863,
853
+ "loss_nan_ranks": 0,
854
+ "loss_rank_avg": 0.14589013159275055,
855
+ "step": 385,
856
+ "valid_targets_mean": 2809.4,
857
+ "valid_targets_min": 718
858
+ },
859
+ {
860
+ "epoch": 1.2464,
861
+ "grad_norm": 0.3425508807463841,
862
+ "learning_rate": 3.737970099317535e-05,
863
+ "loss": 0.3805,
864
+ "loss_nan_ranks": 0,
865
+ "loss_rank_avg": 0.19089946150779724,
866
+ "step": 390,
867
+ "valid_targets_mean": 4076.4,
868
+ "valid_targets_min": 404
869
+ },
870
+ {
871
+ "epoch": 1.2624,
872
+ "grad_norm": 0.3232819729746728,
873
+ "learning_rate": 3.726821119772803e-05,
874
+ "loss": 0.3981,
875
+ "loss_nan_ranks": 0,
876
+ "loss_rank_avg": 0.18869639933109283,
877
+ "step": 395,
878
+ "valid_targets_mean": 4955.0,
879
+ "valid_targets_min": 779
880
+ },
881
+ {
882
+ "epoch": 1.2784,
883
+ "grad_norm": 0.32513134903898305,
884
+ "learning_rate": 3.7154572200005446e-05,
885
+ "loss": 0.3655,
886
+ "loss_nan_ranks": 0,
887
+ "loss_rank_avg": 0.158743754029274,
888
+ "step": 400,
889
+ "valid_targets_mean": 4409.9,
890
+ "valid_targets_min": 673
891
+ },
892
+ {
893
+ "epoch": 1.2944,
894
+ "grad_norm": 0.41954006335784794,
895
+ "learning_rate": 3.703879814352193e-05,
896
+ "loss": 0.3948,
897
+ "loss_nan_ranks": 0,
898
+ "loss_rank_avg": 0.2265087366104126,
899
+ "step": 405,
900
+ "valid_targets_mean": 4266.3,
901
+ "valid_targets_min": 649
902
+ },
903
+ {
904
+ "epoch": 1.3104,
905
+ "grad_norm": 0.31152266082551305,
906
+ "learning_rate": 3.6920903437521305e-05,
907
+ "loss": 0.3539,
908
+ "loss_nan_ranks": 0,
909
+ "loss_rank_avg": 0.14884796738624573,
910
+ "step": 410,
911
+ "valid_targets_mean": 4143.8,
912
+ "valid_targets_min": 645
913
+ },
914
+ {
915
+ "epoch": 1.3264,
916
+ "grad_norm": 0.3874174052877045,
917
+ "learning_rate": 3.680090275518352e-05,
918
+ "loss": 0.39,
919
+ "loss_nan_ranks": 0,
920
+ "loss_rank_avg": 0.14476105570793152,
921
+ "step": 415,
922
+ "valid_targets_mean": 2301.5,
923
+ "valid_targets_min": 979
924
+ },
925
+ {
926
+ "epoch": 1.3424,
927
+ "grad_norm": 0.4076202345092236,
928
+ "learning_rate": 3.667881103179844e-05,
929
+ "loss": 0.406,
930
+ "loss_nan_ranks": 0,
931
+ "loss_rank_avg": 0.1450413465499878,
932
+ "step": 420,
933
+ "valid_targets_mean": 2344.8,
934
+ "valid_targets_min": 764
935
+ },
936
+ {
937
+ "epoch": 1.3584,
938
+ "grad_norm": 0.36629328269123096,
939
+ "learning_rate": 3.655464346290697e-05,
940
+ "loss": 0.3553,
941
+ "loss_nan_ranks": 0,
942
+ "loss_rank_avg": 0.1950920820236206,
943
+ "step": 425,
944
+ "valid_targets_mean": 3503.9,
945
+ "valid_targets_min": 786
946
+ },
947
+ {
948
+ "epoch": 1.3744,
949
+ "grad_norm": 0.41610685727170865,
950
+ "learning_rate": 3.642841550240983e-05,
951
+ "loss": 0.3875,
952
+ "loss_nan_ranks": 0,
953
+ "loss_rank_avg": 0.1455891728401184,
954
+ "step": 430,
955
+ "valid_targets_mean": 2554.6,
956
+ "valid_targets_min": 799
957
+ },
958
+ {
959
+ "epoch": 1.3904,
960
+ "grad_norm": 0.366729970723816,
961
+ "learning_rate": 3.630014286064419e-05,
962
+ "loss": 0.3591,
963
+ "loss_nan_ranks": 0,
964
+ "loss_rank_avg": 0.1807524859905243,
965
+ "step": 435,
966
+ "valid_targets_mean": 3847.1,
967
+ "valid_targets_min": 689
968
+ },
969
+ {
970
+ "epoch": 1.4064,
971
+ "grad_norm": 0.32794983321715615,
972
+ "learning_rate": 3.6169841502428285e-05,
973
+ "loss": 0.3735,
974
+ "loss_nan_ranks": 0,
975
+ "loss_rank_avg": 0.14754189550876617,
976
+ "step": 440,
977
+ "valid_targets_mean": 3818.2,
978
+ "valid_targets_min": 541
979
+ },
980
+ {
981
+ "epoch": 1.4224,
982
+ "grad_norm": 0.44011771339583505,
983
+ "learning_rate": 3.603752764507454e-05,
984
+ "loss": 0.4034,
985
+ "loss_nan_ranks": 0,
986
+ "loss_rank_avg": 0.20290511846542358,
987
+ "step": 445,
988
+ "valid_targets_mean": 2867.4,
989
+ "valid_targets_min": 924
990
+ },
991
+ {
992
+ "epoch": 1.4384000000000001,
993
+ "grad_norm": 0.3875724203422835,
994
+ "learning_rate": 3.5903217756371066e-05,
995
+ "loss": 0.39,
996
+ "loss_nan_ranks": 0,
997
+ "loss_rank_avg": 0.24192467331886292,
998
+ "step": 450,
999
+ "valid_targets_mean": 4539.0,
1000
+ "valid_targets_min": 686
1001
+ },
1002
+ {
1003
+ "epoch": 1.4544000000000001,
1004
+ "grad_norm": 0.35423009985165277,
1005
+ "learning_rate": 3.576692855253213e-05,
1006
+ "loss": 0.3669,
1007
+ "loss_nan_ranks": 0,
1008
+ "loss_rank_avg": 0.1834782361984253,
1009
+ "step": 455,
1010
+ "valid_targets_mean": 4385.8,
1011
+ "valid_targets_min": 740
1012
+ },
1013
+ {
1014
+ "epoch": 1.4704,
1015
+ "grad_norm": 0.34329690274477853,
1016
+ "learning_rate": 3.562867699611764e-05,
1017
+ "loss": 0.3522,
1018
+ "loss_nan_ranks": 0,
1019
+ "loss_rank_avg": 0.16957998275756836,
1020
+ "step": 460,
1021
+ "valid_targets_mean": 4358.4,
1022
+ "valid_targets_min": 705
1023
+ },
1024
+ {
1025
+ "epoch": 1.4864,
1026
+ "grad_norm": 0.4969730194262121,
1027
+ "learning_rate": 3.5488480293922e-05,
1028
+ "loss": 0.3526,
1029
+ "loss_nan_ranks": 0,
1030
+ "loss_rank_avg": 0.14323380589485168,
1031
+ "step": 465,
1032
+ "valid_targets_mean": 3332.0,
1033
+ "valid_targets_min": 979
1034
+ },
1035
+ {
1036
+ "epoch": 1.5024,
1037
+ "grad_norm": 0.3878219097905555,
1038
+ "learning_rate": 3.5346355894832515e-05,
1039
+ "loss": 0.3929,
1040
+ "loss_nan_ranks": 0,
1041
+ "loss_rank_avg": 0.25531578063964844,
1042
+ "step": 470,
1043
+ "valid_targets_mean": 4687.0,
1044
+ "valid_targets_min": 699
1045
+ },
1046
+ {
1047
+ "epoch": 1.5184,
1048
+ "grad_norm": 0.41901557134182976,
1049
+ "learning_rate": 3.520232148765774e-05,
1050
+ "loss": 0.3579,
1051
+ "loss_nan_ranks": 0,
1052
+ "loss_rank_avg": 0.15701517462730408,
1053
+ "step": 475,
1054
+ "valid_targets_mean": 3616.7,
1055
+ "valid_targets_min": 660
1056
+ },
1057
+ {
1058
+ "epoch": 1.5344,
1059
+ "grad_norm": 0.3337566112172416,
1060
+ "learning_rate": 3.505639499892591e-05,
1061
+ "loss": 0.3606,
1062
+ "loss_nan_ranks": 0,
1063
+ "loss_rank_avg": 0.14276567101478577,
1064
+ "step": 480,
1065
+ "valid_targets_mean": 4460.1,
1066
+ "valid_targets_min": 705
1067
+ },
1068
+ {
1069
+ "epoch": 1.5504,
1070
+ "grad_norm": 0.376009534019795,
1071
+ "learning_rate": 3.490859459065382e-05,
1072
+ "loss": 0.3762,
1073
+ "loss_nan_ranks": 0,
1074
+ "loss_rank_avg": 0.18227124214172363,
1075
+ "step": 485,
1076
+ "valid_targets_mean": 4140.9,
1077
+ "valid_targets_min": 927
1078
+ },
1079
+ {
1080
+ "epoch": 1.5664,
1081
+ "grad_norm": 0.3657848949268241,
1082
+ "learning_rate": 3.475893865808633e-05,
1083
+ "loss": 0.3859,
1084
+ "loss_nan_ranks": 0,
1085
+ "loss_rank_avg": 0.1826404631137848,
1086
+ "step": 490,
1087
+ "valid_targets_mean": 3706.3,
1088
+ "valid_targets_min": 692
1089
+ },
1090
+ {
1091
+ "epoch": 1.5824,
1092
+ "grad_norm": 0.4325898106752329,
1093
+ "learning_rate": 3.4607445827406984e-05,
1094
+ "loss": 0.4181,
1095
+ "loss_nan_ranks": 0,
1096
+ "loss_rank_avg": 0.23097142577171326,
1097
+ "step": 495,
1098
+ "valid_targets_mean": 3480.9,
1099
+ "valid_targets_min": 832
1100
+ },
1101
+ {
1102
+ "epoch": 1.5984,
1103
+ "grad_norm": 0.2802872024830924,
1104
+ "learning_rate": 3.445413495341971e-05,
1105
+ "loss": 0.363,
1106
+ "loss_nan_ranks": 0,
1107
+ "loss_rank_avg": 0.13881602883338928,
1108
+ "step": 500,
1109
+ "valid_targets_mean": 5370.1,
1110
+ "valid_targets_min": 430
1111
+ },
1112
+ {
1113
+ "epoch": 1.6143999999999998,
1114
+ "grad_norm": 0.43602069232557744,
1115
+ "learning_rate": 3.429902511720216e-05,
1116
+ "loss": 0.3927,
1117
+ "loss_nan_ranks": 0,
1118
+ "loss_rank_avg": 0.23958256840705872,
1119
+ "step": 505,
1120
+ "valid_targets_mean": 3277.2,
1121
+ "valid_targets_min": 1050
1122
+ },
1123
+ {
1124
+ "epoch": 1.6303999999999998,
1125
+ "grad_norm": 0.41936014097791685,
1126
+ "learning_rate": 3.4142135623730954e-05,
1127
+ "loss": 0.3791,
1128
+ "loss_nan_ranks": 0,
1129
+ "loss_rank_avg": 0.15627846121788025,
1130
+ "step": 510,
1131
+ "valid_targets_mean": 4432.1,
1132
+ "valid_targets_min": 558
1133
+ },
1134
+ {
1135
+ "epoch": 1.6463999999999999,
1136
+ "grad_norm": 0.3358679962436761,
1137
+ "learning_rate": 3.398348599947888e-05,
1138
+ "loss": 0.3754,
1139
+ "loss_nan_ranks": 0,
1140
+ "loss_rank_avg": 0.132685586810112,
1141
+ "step": 515,
1142
+ "valid_targets_mean": 4297.8,
1143
+ "valid_targets_min": 707
1144
+ },
1145
+ {
1146
+ "epoch": 1.6623999999999999,
1147
+ "grad_norm": 0.338992772693464,
1148
+ "learning_rate": 3.3823095989984697e-05,
1149
+ "loss": 0.41,
1150
+ "loss_nan_ranks": 0,
1151
+ "loss_rank_avg": 0.1532086730003357,
1152
+ "step": 520,
1153
+ "valid_targets_mean": 3837.9,
1154
+ "valid_targets_min": 779
1155
+ },
1156
+ {
1157
+ "epoch": 1.6784,
1158
+ "grad_norm": 0.39162023399835316,
1159
+ "learning_rate": 3.366098555739557e-05,
1160
+ "loss": 0.3623,
1161
+ "loss_nan_ranks": 0,
1162
+ "loss_rank_avg": 0.16554811596870422,
1163
+ "step": 525,
1164
+ "valid_targets_mean": 4507.1,
1165
+ "valid_targets_min": 867
1166
+ },
1167
+ {
1168
+ "epoch": 1.6944,
1169
+ "grad_norm": 0.41290820076532175,
1170
+ "learning_rate": 3.349717487798261e-05,
1171
+ "loss": 0.3722,
1172
+ "loss_nan_ranks": 0,
1173
+ "loss_rank_avg": 0.20561575889587402,
1174
+ "step": 530,
1175
+ "valid_targets_mean": 3697.0,
1176
+ "valid_targets_min": 764
1177
+ },
1178
+ {
1179
+ "epoch": 1.7104,
1180
+ "grad_norm": 0.3494036698637042,
1181
+ "learning_rate": 3.3331684339629706e-05,
1182
+ "loss": 0.3734,
1183
+ "loss_nan_ranks": 0,
1184
+ "loss_rank_avg": 0.15987402200698853,
1185
+ "step": 535,
1186
+ "valid_targets_mean": 4119.2,
1187
+ "valid_targets_min": 668
1188
+ },
1189
+ {
1190
+ "epoch": 1.7264,
1191
+ "grad_norm": 0.47807369844807185,
1192
+ "learning_rate": 3.3164534539296056e-05,
1193
+ "loss": 0.3747,
1194
+ "loss_nan_ranks": 0,
1195
+ "loss_rank_avg": 0.16826443374156952,
1196
+ "step": 540,
1197
+ "valid_targets_mean": 3707.9,
1198
+ "valid_targets_min": 763
1199
+ },
1200
+ {
1201
+ "epoch": 1.7424,
1202
+ "grad_norm": 0.3361689358143466,
1203
+ "learning_rate": 3.299574628045269e-05,
1204
+ "loss": 0.3615,
1205
+ "loss_nan_ranks": 0,
1206
+ "loss_rank_avg": 0.1361405998468399,
1207
+ "step": 545,
1208
+ "valid_targets_mean": 3589.8,
1209
+ "valid_targets_min": 599
1210
+ },
1211
+ {
1212
+ "epoch": 1.7584,
1213
+ "grad_norm": 0.37484427799298453,
1214
+ "learning_rate": 3.282534057049322e-05,
1215
+ "loss": 0.3349,
1216
+ "loss_nan_ranks": 0,
1217
+ "loss_rank_avg": 0.16627568006515503,
1218
+ "step": 550,
1219
+ "valid_targets_mean": 3740.7,
1220
+ "valid_targets_min": 957
1221
+ },
1222
+ {
1223
+ "epoch": 1.7744,
1224
+ "grad_norm": 0.37363103188547103,
1225
+ "learning_rate": 3.265333861811933e-05,
1226
+ "loss": 0.3792,
1227
+ "loss_nan_ranks": 0,
1228
+ "loss_rank_avg": 0.18092834949493408,
1229
+ "step": 555,
1230
+ "valid_targets_mean": 3324.7,
1231
+ "valid_targets_min": 695
1232
+ },
1233
+ {
1234
+ "epoch": 1.7904,
1235
+ "grad_norm": 0.3839625364420311,
1236
+ "learning_rate": 3.2479761830701075e-05,
1237
+ "loss": 0.3918,
1238
+ "loss_nan_ranks": 0,
1239
+ "loss_rank_avg": 0.21519896388053894,
1240
+ "step": 560,
1241
+ "valid_targets_mean": 3921.2,
1242
+ "valid_targets_min": 606
1243
+ },
1244
+ {
1245
+ "epoch": 1.8064,
1246
+ "grad_norm": 0.37568764545151495,
1247
+ "learning_rate": 3.230463181161254e-05,
1248
+ "loss": 0.3933,
1249
+ "loss_nan_ranks": 0,
1250
+ "loss_rank_avg": 0.1830562949180603,
1251
+ "step": 565,
1252
+ "valid_targets_mean": 3310.1,
1253
+ "valid_targets_min": 711
1254
+ },
1255
+ {
1256
+ "epoch": 1.8224,
1257
+ "grad_norm": 0.3288521673537427,
1258
+ "learning_rate": 3.212797035754311e-05,
1259
+ "loss": 0.3873,
1260
+ "loss_nan_ranks": 0,
1261
+ "loss_rank_avg": 0.1765051782131195,
1262
+ "step": 570,
1263
+ "valid_targets_mean": 4186.1,
1264
+ "valid_targets_min": 656
1265
+ },
1266
+ {
1267
+ "epoch": 1.8384,
1268
+ "grad_norm": 0.327471713865092,
1269
+ "learning_rate": 3.194979945578461e-05,
1270
+ "loss": 0.3926,
1271
+ "loss_nan_ranks": 0,
1272
+ "loss_rank_avg": 0.18825772404670715,
1273
+ "step": 575,
1274
+ "valid_targets_mean": 5375.1,
1275
+ "valid_targets_min": 724
1276
+ },
1277
+ {
1278
+ "epoch": 1.8544,
1279
+ "grad_norm": 0.3373202744506304,
1280
+ "learning_rate": 3.177014128149479e-05,
1281
+ "loss": 0.3546,
1282
+ "loss_nan_ranks": 0,
1283
+ "loss_rank_avg": 0.19015300273895264,
1284
+ "step": 580,
1285
+ "valid_targets_mean": 4340.6,
1286
+ "valid_targets_min": 679
1287
+ },
1288
+ {
1289
+ "epoch": 1.8704,
1290
+ "grad_norm": 0.3469982372901262,
1291
+ "learning_rate": 3.158901819493742e-05,
1292
+ "loss": 0.3814,
1293
+ "loss_nan_ranks": 0,
1294
+ "loss_rank_avg": 0.22717589139938354,
1295
+ "step": 585,
1296
+ "valid_targets_mean": 6144.8,
1297
+ "valid_targets_min": 626
1298
+ },
1299
+ {
1300
+ "epoch": 1.8864,
1301
+ "grad_norm": 0.37258492423632356,
1302
+ "learning_rate": 3.1406452738699284e-05,
1303
+ "loss": 0.3693,
1304
+ "loss_nan_ranks": 0,
1305
+ "loss_rank_avg": 0.1856401562690735,
1306
+ "step": 590,
1307
+ "valid_targets_mean": 3395.8,
1308
+ "valid_targets_min": 659
1309
+ },
1310
+ {
1311
+ "epoch": 1.9024,
1312
+ "grad_norm": 0.3379882445810335,
1313
+ "learning_rate": 3.122246763488457e-05,
1314
+ "loss": 0.3629,
1315
+ "loss_nan_ranks": 0,
1316
+ "loss_rank_avg": 0.1680774986743927,
1317
+ "step": 595,
1318
+ "valid_targets_mean": 4609.5,
1319
+ "valid_targets_min": 585
1320
+ },
1321
+ {
1322
+ "epoch": 1.9184,
1323
+ "grad_norm": 0.3436516413868104,
1324
+ "learning_rate": 3.103708578228686e-05,
1325
+ "loss": 0.3835,
1326
+ "loss_nan_ranks": 0,
1327
+ "loss_rank_avg": 0.1666383296251297,
1328
+ "step": 600,
1329
+ "valid_targets_mean": 3579.3,
1330
+ "valid_targets_min": 943
1331
+ },
1332
+ {
1333
+ "epoch": 1.9344000000000001,
1334
+ "grad_norm": 0.4313763145949012,
1335
+ "learning_rate": 3.085033025353915e-05,
1336
+ "loss": 0.3808,
1337
+ "loss_nan_ranks": 0,
1338
+ "loss_rank_avg": 0.19704148173332214,
1339
+ "step": 605,
1340
+ "valid_targets_mean": 2658.1,
1341
+ "valid_targets_min": 639
1342
+ },
1343
+ {
1344
+ "epoch": 1.9504000000000001,
1345
+ "grad_norm": 0.37657788301827405,
1346
+ "learning_rate": 3.066222429224221e-05,
1347
+ "loss": 0.3776,
1348
+ "loss_nan_ranks": 0,
1349
+ "loss_rank_avg": 0.16659879684448242,
1350
+ "step": 610,
1351
+ "valid_targets_mean": 3304.2,
1352
+ "valid_targets_min": 661
1353
+ },
1354
+ {
1355
+ "epoch": 1.9664000000000001,
1356
+ "grad_norm": 0.4058483842532848,
1357
+ "learning_rate": 3.047279131007173e-05,
1358
+ "loss": 0.3716,
1359
+ "loss_nan_ranks": 0,
1360
+ "loss_rank_avg": 0.20012781023979187,
1361
+ "step": 615,
1362
+ "valid_targets_mean": 2792.8,
1363
+ "valid_targets_min": 827
1364
+ },
1365
+ {
1366
+ "epoch": 1.9824000000000002,
1367
+ "grad_norm": 0.412660563012823,
1368
+ "learning_rate": 3.0282054883864434e-05,
1369
+ "loss": 0.3624,
1370
+ "loss_nan_ranks": 0,
1371
+ "loss_rank_avg": 0.1812940537929535,
1372
+ "step": 620,
1373
+ "valid_targets_mean": 2277.3,
1374
+ "valid_targets_min": 666
1375
+ },
1376
+ {
1377
+ "epoch": 1.9984,
1378
+ "grad_norm": 0.32332051288280694,
1379
+ "learning_rate": 3.009003875268379e-05,
1380
+ "loss": 0.3744,
1381
+ "loss_nan_ranks": 0,
1382
+ "loss_rank_avg": 0.12370559573173523,
1383
+ "step": 625,
1384
+ "valid_targets_mean": 2962.4,
1385
+ "valid_targets_min": 866
1386
+ },
1387
+ {
1388
+ "epoch": 2.0128,
1389
+ "grad_norm": 0.3552382763657628,
1390
+ "learning_rate": 2.9896766814865355e-05,
1391
+ "loss": 0.3532,
1392
+ "loss_nan_ranks": 0,
1393
+ "loss_rank_avg": 0.14251808822155,
1394
+ "step": 630,
1395
+ "valid_targets_mean": 2889.8,
1396
+ "valid_targets_min": 829
1397
+ },
1398
+ {
1399
+ "epoch": 2.0288,
1400
+ "grad_norm": 0.3663974186319015,
1401
+ "learning_rate": 2.970226312504246e-05,
1402
+ "loss": 0.3675,
1403
+ "loss_nan_ranks": 0,
1404
+ "loss_rank_avg": 0.15324479341506958,
1405
+ "step": 635,
1406
+ "valid_targets_mean": 3010.3,
1407
+ "valid_targets_min": 768
1408
+ },
1409
+ {
1410
+ "epoch": 2.0448,
1411
+ "grad_norm": 0.39187020362125585,
1412
+ "learning_rate": 2.9506551891152334e-05,
1413
+ "loss": 0.378,
1414
+ "loss_nan_ranks": 0,
1415
+ "loss_rank_avg": 0.26267358660697937,
1416
+ "step": 640,
1417
+ "valid_targets_mean": 4856.9,
1418
+ "valid_targets_min": 570
1419
+ },
1420
+ {
1421
+ "epoch": 2.0608,
1422
+ "grad_norm": 0.31957017059314285,
1423
+ "learning_rate": 2.930965747142319e-05,
1424
+ "loss": 0.3305,
1425
+ "loss_nan_ranks": 0,
1426
+ "loss_rank_avg": 0.16857537627220154,
1427
+ "step": 645,
1428
+ "valid_targets_mean": 4703.2,
1429
+ "valid_targets_min": 829
1430
+ },
1431
+ {
1432
+ "epoch": 2.0768,
1433
+ "grad_norm": 0.40369627273889996,
1434
+ "learning_rate": 2.9111604371342593e-05,
1435
+ "loss": 0.337,
1436
+ "loss_nan_ranks": 0,
1437
+ "loss_rank_avg": 0.15384268760681152,
1438
+ "step": 650,
1439
+ "valid_targets_mean": 2815.1,
1440
+ "valid_targets_min": 699
1441
+ },
1442
+ {
1443
+ "epoch": 2.0928,
1444
+ "grad_norm": 0.46412451317350417,
1445
+ "learning_rate": 2.891241724060752e-05,
1446
+ "loss": 0.3493,
1447
+ "loss_nan_ranks": 0,
1448
+ "loss_rank_avg": 0.18567049503326416,
1449
+ "step": 655,
1450
+ "valid_targets_mean": 4328.1,
1451
+ "valid_targets_min": 814
1452
+ },
1453
+ {
1454
+ "epoch": 2.1088,
1455
+ "grad_norm": 0.354455219931927,
1456
+ "learning_rate": 2.8712120870056455e-05,
1457
+ "loss": 0.3565,
1458
+ "loss_nan_ranks": 0,
1459
+ "loss_rank_avg": 0.17629089951515198,
1460
+ "step": 660,
1461
+ "valid_targets_mean": 4897.3,
1462
+ "valid_targets_min": 740
1463
+ },
1464
+ {
1465
+ "epoch": 2.1248,
1466
+ "grad_norm": 0.32558707469954473,
1467
+ "learning_rate": 2.851074018858389e-05,
1468
+ "loss": 0.3671,
1469
+ "loss_nan_ranks": 0,
1470
+ "loss_rank_avg": 0.17213571071624756,
1471
+ "step": 665,
1472
+ "valid_targets_mean": 5233.5,
1473
+ "valid_targets_min": 792
1474
+ },
1475
+ {
1476
+ "epoch": 2.1408,
1477
+ "grad_norm": 0.42136092165410943,
1478
+ "learning_rate": 2.8308300260037734e-05,
1479
+ "loss": 0.3606,
1480
+ "loss_nan_ranks": 0,
1481
+ "loss_rank_avg": 0.17917993664741516,
1482
+ "step": 670,
1483
+ "valid_targets_mean": 3339.1,
1484
+ "valid_targets_min": 1000
1485
+ },
1486
+ {
1487
+ "epoch": 2.1568,
1488
+ "grad_norm": 0.3456749974395744,
1489
+ "learning_rate": 2.8104826280099796e-05,
1490
+ "loss": 0.3769,
1491
+ "loss_nan_ranks": 0,
1492
+ "loss_rank_avg": 0.16497279703617096,
1493
+ "step": 675,
1494
+ "valid_targets_mean": 4999.3,
1495
+ "valid_targets_min": 1058
1496
+ },
1497
+ {
1498
+ "epoch": 2.1728,
1499
+ "grad_norm": 0.47636949292183406,
1500
+ "learning_rate": 2.7900343573150003e-05,
1501
+ "loss": 0.3549,
1502
+ "loss_nan_ranks": 0,
1503
+ "loss_rank_avg": 0.16123411059379578,
1504
+ "step": 680,
1505
+ "valid_targets_mean": 3640.6,
1506
+ "valid_targets_min": 521
1507
+ },
1508
+ {
1509
+ "epoch": 2.1888,
1510
+ "grad_norm": 0.3802344366234923,
1511
+ "learning_rate": 2.7694877589114442e-05,
1512
+ "loss": 0.3675,
1513
+ "loss_nan_ranks": 0,
1514
+ "loss_rank_avg": 0.15201343595981598,
1515
+ "step": 685,
1516
+ "valid_targets_mean": 3488.5,
1517
+ "valid_targets_min": 661
1518
+ },
1519
+ {
1520
+ "epoch": 2.2048,
1521
+ "grad_norm": 0.3889010529697492,
1522
+ "learning_rate": 2.748845390029794e-05,
1523
+ "loss": 0.3462,
1524
+ "loss_nan_ranks": 0,
1525
+ "loss_rank_avg": 0.136835515499115,
1526
+ "step": 690,
1527
+ "valid_targets_mean": 3073.3,
1528
+ "valid_targets_min": 1053
1529
+ },
1530
+ {
1531
+ "epoch": 2.2208,
1532
+ "grad_norm": 0.31268069502172785,
1533
+ "learning_rate": 2.728109819820129e-05,
1534
+ "loss": 0.353,
1535
+ "loss_nan_ranks": 0,
1536
+ "loss_rank_avg": 0.1584779918193817,
1537
+ "step": 695,
1538
+ "valid_targets_mean": 5115.0,
1539
+ "valid_targets_min": 643
1540
+ },
1541
+ {
1542
+ "epoch": 2.2368,
1543
+ "grad_norm": 0.3517489537279622,
1544
+ "learning_rate": 2.7072836290323698e-05,
1545
+ "loss": 0.3614,
1546
+ "loss_nan_ranks": 0,
1547
+ "loss_rank_avg": 0.13909167051315308,
1548
+ "step": 700,
1549
+ "valid_targets_mean": 4441.3,
1550
+ "valid_targets_min": 846
1551
+ },
1552
+ {
1553
+ "epoch": 2.2528,
1554
+ "grad_norm": 0.42108941278672635,
1555
+ "learning_rate": 2.6863694096950763e-05,
1556
+ "loss": 0.3447,
1557
+ "loss_nan_ranks": 0,
1558
+ "loss_rank_avg": 0.17374692857265472,
1559
+ "step": 705,
1560
+ "valid_targets_mean": 4347.6,
1561
+ "valid_targets_min": 738
1562
+ },
1563
+ {
1564
+ "epoch": 2.2688,
1565
+ "grad_norm": 0.3854671122527939,
1566
+ "learning_rate": 2.6653697647928485e-05,
1567
+ "loss": 0.3676,
1568
+ "loss_nan_ranks": 0,
1569
+ "loss_rank_avg": 0.18790662288665771,
1570
+ "step": 710,
1571
+ "valid_targets_mean": 3574.8,
1572
+ "valid_targets_min": 971
1573
+ },
1574
+ {
1575
+ "epoch": 2.2848,
1576
+ "grad_norm": 0.3325381979319747,
1577
+ "learning_rate": 2.644287307942352e-05,
1578
+ "loss": 0.357,
1579
+ "loss_nan_ranks": 0,
1580
+ "loss_rank_avg": 0.2031068354845047,
1581
+ "step": 715,
1582
+ "valid_targets_mean": 6072.4,
1583
+ "valid_targets_min": 1032
1584
+ },
1585
+ {
1586
+ "epoch": 2.3008,
1587
+ "grad_norm": 0.38008293980011126,
1588
+ "learning_rate": 2.623124663067034e-05,
1589
+ "loss": 0.3578,
1590
+ "loss_nan_ranks": 0,
1591
+ "loss_rank_avg": 0.17572256922721863,
1592
+ "step": 720,
1593
+ "valid_targets_mean": 4194.4,
1594
+ "valid_targets_min": 875
1595
+ },
1596
+ {
1597
+ "epoch": 2.3168,
1598
+ "grad_norm": 0.4090255604288786,
1599
+ "learning_rate": 2.6018844640705448e-05,
1600
+ "loss": 0.3539,
1601
+ "loss_nan_ranks": 0,
1602
+ "loss_rank_avg": 0.18574786186218262,
1603
+ "step": 725,
1604
+ "valid_targets_mean": 5218.9,
1605
+ "valid_targets_min": 1056
1606
+ },
1607
+ {
1608
+ "epoch": 2.3327999999999998,
1609
+ "grad_norm": 0.3695725965918268,
1610
+ "learning_rate": 2.580569354508925e-05,
1611
+ "loss": 0.3543,
1612
+ "loss_nan_ranks": 0,
1613
+ "loss_rank_avg": 0.19336247444152832,
1614
+ "step": 730,
1615
+ "valid_targets_mean": 3839.9,
1616
+ "valid_targets_min": 734
1617
+ },
1618
+ {
1619
+ "epoch": 2.3487999999999998,
1620
+ "grad_norm": 0.4037004371230533,
1621
+ "learning_rate": 2.5591819872615856e-05,
1622
+ "loss": 0.344,
1623
+ "loss_nan_ranks": 0,
1624
+ "loss_rank_avg": 0.1890317052602768,
1625
+ "step": 735,
1626
+ "valid_targets_mean": 3146.3,
1627
+ "valid_targets_min": 777
1628
+ },
1629
+ {
1630
+ "epoch": 2.3648,
1631
+ "grad_norm": 0.42034002047772934,
1632
+ "learning_rate": 2.5377250242011338e-05,
1633
+ "loss": 0.3671,
1634
+ "loss_nan_ranks": 0,
1635
+ "loss_rank_avg": 0.21791379153728485,
1636
+ "step": 740,
1637
+ "valid_targets_mean": 4096.5,
1638
+ "valid_targets_min": 744
1639
+ },
1640
+ {
1641
+ "epoch": 2.3808,
1642
+ "grad_norm": 0.42506199410875983,
1643
+ "learning_rate": 2.516201135862073e-05,
1644
+ "loss": 0.3339,
1645
+ "loss_nan_ranks": 0,
1646
+ "loss_rank_avg": 0.18126055598258972,
1647
+ "step": 745,
1648
+ "valid_targets_mean": 4013.9,
1649
+ "valid_targets_min": 1198
1650
+ },
1651
+ {
1652
+ "epoch": 2.3968,
1653
+ "grad_norm": 0.34527703907607993,
1654
+ "learning_rate": 2.494613001108431e-05,
1655
+ "loss": 0.3543,
1656
+ "loss_nan_ranks": 0,
1657
+ "loss_rank_avg": 0.12390469014644623,
1658
+ "step": 750,
1659
+ "valid_targets_mean": 3295.7,
1660
+ "valid_targets_min": 752
1661
+ },
1662
+ {
1663
+ "epoch": 2.4128,
1664
+ "grad_norm": 0.34904761415566266,
1665
+ "learning_rate": 2.4729633068003466e-05,
1666
+ "loss": 0.3474,
1667
+ "loss_nan_ranks": 0,
1668
+ "loss_rank_avg": 0.1934797316789627,
1669
+ "step": 755,
1670
+ "valid_targets_mean": 4930.4,
1671
+ "valid_targets_min": 899
1672
+ },
1673
+ {
1674
+ "epoch": 2.4288,
1675
+ "grad_norm": 0.38001649074269783,
1676
+ "learning_rate": 2.4512547474596624e-05,
1677
+ "loss": 0.3413,
1678
+ "loss_nan_ranks": 0,
1679
+ "loss_rank_avg": 0.2003239393234253,
1680
+ "step": 760,
1681
+ "valid_targets_mean": 3611.4,
1682
+ "valid_targets_min": 741
1683
+ },
1684
+ {
1685
+ "epoch": 2.4448,
1686
+ "grad_norm": 0.3518099712444648,
1687
+ "learning_rate": 2.429490024934566e-05,
1688
+ "loss": 0.3511,
1689
+ "loss_nan_ranks": 0,
1690
+ "loss_rank_avg": 0.15470391511917114,
1691
+ "step": 765,
1692
+ "valid_targets_mean": 3514.5,
1693
+ "valid_targets_min": 972
1694
+ },
1695
+ {
1696
+ "epoch": 2.4608,
1697
+ "grad_norm": 0.3448887159515848,
1698
+ "learning_rate": 2.4076718480633178e-05,
1699
+ "loss": 0.358,
1700
+ "loss_nan_ranks": 0,
1701
+ "loss_rank_avg": 0.16293978691101074,
1702
+ "step": 770,
1703
+ "valid_targets_mean": 3817.4,
1704
+ "valid_targets_min": 848
1705
+ },
1706
+ {
1707
+ "epoch": 2.4768,
1708
+ "grad_norm": 0.3636861883493942,
1709
+ "learning_rate": 2.3858029323371067e-05,
1710
+ "loss": 0.3526,
1711
+ "loss_nan_ranks": 0,
1712
+ "loss_rank_avg": 0.16884376108646393,
1713
+ "step": 775,
1714
+ "valid_targets_mean": 3655.5,
1715
+ "valid_targets_min": 687
1716
+ },
1717
+ {
1718
+ "epoch": 2.4928,
1719
+ "grad_norm": 0.36593321086782454,
1720
+ "learning_rate": 2.363885999562084e-05,
1721
+ "loss": 0.3737,
1722
+ "loss_nan_ranks": 0,
1723
+ "loss_rank_avg": 0.12894733250141144,
1724
+ "step": 780,
1725
+ "valid_targets_mean": 2824.6,
1726
+ "valid_targets_min": 758
1727
+ },
1728
+ {
1729
+ "epoch": 2.5088,
1730
+ "grad_norm": 0.48954320211825053,
1731
+ "learning_rate": 2.3419237775206026e-05,
1732
+ "loss": 0.3356,
1733
+ "loss_nan_ranks": 0,
1734
+ "loss_rank_avg": 0.1467287838459015,
1735
+ "step": 785,
1736
+ "valid_targets_mean": 4619.7,
1737
+ "valid_targets_min": 613
1738
+ },
1739
+ {
1740
+ "epoch": 2.5248,
1741
+ "grad_norm": 0.38004730454836627,
1742
+ "learning_rate": 2.3199189996317205e-05,
1743
+ "loss": 0.3398,
1744
+ "loss_nan_ranks": 0,
1745
+ "loss_rank_avg": 0.1634030044078827,
1746
+ "step": 790,
1747
+ "valid_targets_mean": 2998.8,
1748
+ "valid_targets_min": 767
1749
+ },
1750
+ {
1751
+ "epoch": 2.5408,
1752
+ "grad_norm": 0.4280336673825541,
1753
+ "learning_rate": 2.297874404610998e-05,
1754
+ "loss": 0.3531,
1755
+ "loss_nan_ranks": 0,
1756
+ "loss_rank_avg": 0.15717938542366028,
1757
+ "step": 795,
1758
+ "valid_targets_mean": 2775.2,
1759
+ "valid_targets_min": 679
1760
+ },
1761
+ {
1762
+ "epoch": 2.5568,
1763
+ "grad_norm": 0.3977995070636329,
1764
+ "learning_rate": 2.2757927361296376e-05,
1765
+ "loss": 0.3555,
1766
+ "loss_nan_ranks": 0,
1767
+ "loss_rank_avg": 0.17521412670612335,
1768
+ "step": 800,
1769
+ "valid_targets_mean": 4207.9,
1770
+ "valid_targets_min": 920
1771
+ },
1772
+ {
1773
+ "epoch": 2.5728,
1774
+ "grad_norm": 0.3767564140590417,
1775
+ "learning_rate": 2.2536767424730052e-05,
1776
+ "loss": 0.3813,
1777
+ "loss_nan_ranks": 0,
1778
+ "loss_rank_avg": 0.18163838982582092,
1779
+ "step": 805,
1780
+ "valid_targets_mean": 4502.3,
1781
+ "valid_targets_min": 676
1782
+ },
1783
+ {
1784
+ "epoch": 2.5888,
1785
+ "grad_norm": 0.33598923045311496,
1786
+ "learning_rate": 2.2315291761985803e-05,
1787
+ "loss": 0.3588,
1788
+ "loss_nan_ranks": 0,
1789
+ "loss_rank_avg": 0.19112834334373474,
1790
+ "step": 810,
1791
+ "valid_targets_mean": 4212.1,
1792
+ "valid_targets_min": 701
1793
+ },
1794
+ {
1795
+ "epoch": 2.6048,
1796
+ "grad_norm": 0.3321302659711612,
1797
+ "learning_rate": 2.2093527937933716e-05,
1798
+ "loss": 0.3554,
1799
+ "loss_nan_ranks": 0,
1800
+ "loss_rank_avg": 0.18555206060409546,
1801
+ "step": 815,
1802
+ "valid_targets_mean": 5138.4,
1803
+ "valid_targets_min": 903
1804
+ },
1805
+ {
1806
+ "epoch": 2.6208,
1807
+ "grad_norm": 0.4332190176537359,
1808
+ "learning_rate": 2.1871503553308447e-05,
1809
+ "loss": 0.3716,
1810
+ "loss_nan_ranks": 0,
1811
+ "loss_rank_avg": 0.24525032937526703,
1812
+ "step": 820,
1813
+ "valid_targets_mean": 3339.2,
1814
+ "valid_targets_min": 530
1815
+ },
1816
+ {
1817
+ "epoch": 2.6368,
1818
+ "grad_norm": 0.3356538304748196,
1819
+ "learning_rate": 2.164924624127403e-05,
1820
+ "loss": 0.3841,
1821
+ "loss_nan_ranks": 0,
1822
+ "loss_rank_avg": 0.1583971232175827,
1823
+ "step": 825,
1824
+ "valid_targets_mean": 4937.4,
1825
+ "valid_targets_min": 780
1826
+ },
1827
+ {
1828
+ "epoch": 2.6528,
1829
+ "grad_norm": 0.43608081245236874,
1830
+ "learning_rate": 2.1426783663984648e-05,
1831
+ "loss": 0.3584,
1832
+ "loss_nan_ranks": 0,
1833
+ "loss_rank_avg": 0.20730876922607422,
1834
+ "step": 830,
1835
+ "valid_targets_mean": 3397.2,
1836
+ "valid_targets_min": 572
1837
+ },
1838
+ {
1839
+ "epoch": 2.6688,
1840
+ "grad_norm": 0.44955532042396595,
1841
+ "learning_rate": 2.1204143509141818e-05,
1842
+ "loss": 0.3553,
1843
+ "loss_nan_ranks": 0,
1844
+ "loss_rank_avg": 0.15707382559776306,
1845
+ "step": 835,
1846
+ "valid_targets_mean": 4469.5,
1847
+ "valid_targets_min": 716
1848
+ },
1849
+ {
1850
+ "epoch": 2.6848,
1851
+ "grad_norm": 0.5050914114602801,
1852
+ "learning_rate": 2.0981353486548363e-05,
1853
+ "loss": 0.3745,
1854
+ "loss_nan_ranks": 0,
1855
+ "loss_rank_avg": 0.20182707905769348,
1856
+ "step": 840,
1857
+ "valid_targets_mean": 3843.1,
1858
+ "valid_targets_min": 1064
1859
+ },
1860
+ {
1861
+ "epoch": 2.7008,
1862
+ "grad_norm": 0.39149794144314437,
1863
+ "learning_rate": 2.075844132465964e-05,
1864
+ "loss": 0.3687,
1865
+ "loss_nan_ranks": 0,
1866
+ "loss_rank_avg": 0.1607154756784439,
1867
+ "step": 845,
1868
+ "valid_targets_mean": 3015.9,
1869
+ "valid_targets_min": 787
1870
+ },
1871
+ {
1872
+ "epoch": 2.7168,
1873
+ "grad_norm": 0.38997342582901406,
1874
+ "learning_rate": 2.0535434767132495e-05,
1875
+ "loss": 0.361,
1876
+ "loss_nan_ranks": 0,
1877
+ "loss_rank_avg": 0.2090197205543518,
1878
+ "step": 850,
1879
+ "valid_targets_mean": 3698.9,
1880
+ "valid_targets_min": 523
1881
+ },
1882
+ {
1883
+ "epoch": 2.7328,
1884
+ "grad_norm": 0.3588635051303132,
1885
+ "learning_rate": 2.0312361569372215e-05,
1886
+ "loss": 0.372,
1887
+ "loss_nan_ranks": 0,
1888
+ "loss_rank_avg": 0.20288428664207458,
1889
+ "step": 855,
1890
+ "valid_targets_mean": 4794.2,
1891
+ "valid_targets_min": 624
1892
+ },
1893
+ {
1894
+ "epoch": 2.7488,
1895
+ "grad_norm": 0.36454271223668117,
1896
+ "learning_rate": 2.0089249495078186e-05,
1897
+ "loss": 0.3525,
1898
+ "loss_nan_ranks": 0,
1899
+ "loss_rank_avg": 0.14510402083396912,
1900
+ "step": 860,
1901
+ "valid_targets_mean": 3878.5,
1902
+ "valid_targets_min": 647
1903
+ },
1904
+ {
1905
+ "epoch": 2.7648,
1906
+ "grad_norm": 0.3980778917765405,
1907
+ "learning_rate": 1.9866126312788333e-05,
1908
+ "loss": 0.3439,
1909
+ "loss_nan_ranks": 0,
1910
+ "loss_rank_avg": 0.17586301267147064,
1911
+ "step": 865,
1912
+ "valid_targets_mean": 3193.8,
1913
+ "valid_targets_min": 734
1914
+ },
1915
+ {
1916
+ "epoch": 2.7808,
1917
+ "grad_norm": 0.38107961900162757,
1918
+ "learning_rate": 1.964301979242308e-05,
1919
+ "loss": 0.3571,
1920
+ "loss_nan_ranks": 0,
1921
+ "loss_rank_avg": 0.1435280442237854,
1922
+ "step": 870,
1923
+ "valid_targets_mean": 3100.1,
1924
+ "valid_targets_min": 757
1925
+ },
1926
+ {
1927
+ "epoch": 2.7968,
1928
+ "grad_norm": 0.3421598363206481,
1929
+ "learning_rate": 1.9419957701829138e-05,
1930
+ "loss": 0.3594,
1931
+ "loss_nan_ranks": 0,
1932
+ "loss_rank_avg": 0.18592098355293274,
1933
+ "step": 875,
1934
+ "valid_targets_mean": 5316.6,
1935
+ "valid_targets_min": 665
1936
+ },
1937
+ {
1938
+ "epoch": 2.8128,
1939
+ "grad_norm": 0.3625020704663103,
1940
+ "learning_rate": 1.9196967803323464e-05,
1941
+ "loss": 0.36,
1942
+ "loss_nan_ranks": 0,
1943
+ "loss_rank_avg": 0.1660204827785492,
1944
+ "step": 880,
1945
+ "valid_targets_mean": 3241.3,
1946
+ "valid_targets_min": 739
1947
+ },
1948
+ {
1949
+ "epoch": 2.8288,
1950
+ "grad_norm": 0.3844458210877275,
1951
+ "learning_rate": 1.8974077850237983e-05,
1952
+ "loss": 0.3501,
1953
+ "loss_nan_ranks": 0,
1954
+ "loss_rank_avg": 0.2059369534254074,
1955
+ "step": 885,
1956
+ "valid_targets_mean": 4584.8,
1957
+ "valid_targets_min": 816
1958
+ },
1959
+ {
1960
+ "epoch": 2.8448,
1961
+ "grad_norm": 0.3539100216907783,
1962
+ "learning_rate": 1.875131558346542e-05,
1963
+ "loss": 0.3435,
1964
+ "loss_nan_ranks": 0,
1965
+ "loss_rank_avg": 0.13612182438373566,
1966
+ "step": 890,
1967
+ "valid_targets_mean": 2774.1,
1968
+ "valid_targets_min": 729
1969
+ },
1970
+ {
1971
+ "epoch": 2.8608000000000002,
1972
+ "grad_norm": 0.40583212495073406,
1973
+ "learning_rate": 1.8528708728006654e-05,
1974
+ "loss": 0.3559,
1975
+ "loss_nan_ranks": 0,
1976
+ "loss_rank_avg": 0.1654030978679657,
1977
+ "step": 895,
1978
+ "valid_targets_mean": 3030.8,
1979
+ "valid_targets_min": 803
1980
+ },
1981
+ {
1982
+ "epoch": 2.8768000000000002,
1983
+ "grad_norm": 0.45124674084690286,
1984
+ "learning_rate": 1.8306284989520055e-05,
1985
+ "loss": 0.3575,
1986
+ "loss_nan_ranks": 0,
1987
+ "loss_rank_avg": 0.22975601255893707,
1988
+ "step": 900,
1989
+ "valid_targets_mean": 5792.0,
1990
+ "valid_targets_min": 713
1991
+ },
1992
+ {
1993
+ "epoch": 2.8928000000000003,
1994
+ "grad_norm": 0.374865102674942,
1995
+ "learning_rate": 1.8084072050873265e-05,
1996
+ "loss": 0.3526,
1997
+ "loss_nan_ranks": 0,
1998
+ "loss_rank_avg": 0.1920439600944519,
1999
+ "step": 905,
2000
+ "valid_targets_mean": 4009.6,
2001
+ "valid_targets_min": 404
2002
+ },
2003
+ {
2004
+ "epoch": 2.9088000000000003,
2005
+ "grad_norm": 0.33227134778852785,
2006
+ "learning_rate": 1.786209756869775e-05,
2007
+ "loss": 0.3495,
2008
+ "loss_nan_ranks": 0,
2009
+ "loss_rank_avg": 0.20190231502056122,
2010
+ "step": 910,
2011
+ "valid_targets_mean": 6347.8,
2012
+ "valid_targets_min": 1377
2013
+ },
2014
+ {
2015
+ "epoch": 2.9248,
2016
+ "grad_norm": 0.35861025661918494,
2017
+ "learning_rate": 1.764038916994669e-05,
2018
+ "loss": 0.3448,
2019
+ "loss_nan_ranks": 0,
2020
+ "loss_rank_avg": 0.1896182894706726,
2021
+ "step": 915,
2022
+ "valid_targets_mean": 4381.7,
2023
+ "valid_targets_min": 859
2024
+ },
2025
+ {
2026
+ "epoch": 2.9408,
2027
+ "grad_norm": 0.35561939433082074,
2028
+ "learning_rate": 1.741897444845649e-05,
2029
+ "loss": 0.3404,
2030
+ "loss_nan_ranks": 0,
2031
+ "loss_rank_avg": 0.22202759981155396,
2032
+ "step": 920,
2033
+ "valid_targets_mean": 4680.8,
2034
+ "valid_targets_min": 1115
2035
+ },
2036
+ {
2037
+ "epoch": 2.9568,
2038
+ "grad_norm": 0.34917086301788913,
2039
+ "learning_rate": 1.7197880961512498e-05,
2040
+ "loss": 0.3537,
2041
+ "loss_nan_ranks": 0,
2042
+ "loss_rank_avg": 0.1740960329771042,
2043
+ "step": 925,
2044
+ "valid_targets_mean": 4159.9,
2045
+ "valid_targets_min": 916
2046
+ },
2047
+ {
2048
+ "epoch": 2.9728,
2049
+ "grad_norm": 0.36394569930759646,
2050
+ "learning_rate": 1.6977136226419187e-05,
2051
+ "loss": 0.3671,
2052
+ "loss_nan_ranks": 0,
2053
+ "loss_rank_avg": 0.19525155425071716,
2054
+ "step": 930,
2055
+ "valid_targets_mean": 4545.5,
2056
+ "valid_targets_min": 1010
2057
+ },
2058
+ {
2059
+ "epoch": 2.9888,
2060
+ "grad_norm": 0.38020671979315085,
2061
+ "learning_rate": 1.6756767717075354e-05,
2062
+ "loss": 0.3801,
2063
+ "loss_nan_ranks": 0,
2064
+ "loss_rank_avg": 0.20520594716072083,
2065
+ "step": 935,
2066
+ "valid_targets_mean": 3499.2,
2067
+ "valid_targets_min": 1092
2068
+ },
2069
+ {
2070
+ "epoch": 3.0032,
2071
+ "grad_norm": 0.37605046406373455,
2072
+ "learning_rate": 1.6536802860554723e-05,
2073
+ "loss": 0.3727,
2074
+ "loss_nan_ranks": 0,
2075
+ "loss_rank_avg": 0.1897515505552292,
2076
+ "step": 940,
2077
+ "valid_targets_mean": 3898.6,
2078
+ "valid_targets_min": 794
2079
+ },
2080
+ {
2081
+ "epoch": 3.0192,
2082
+ "grad_norm": 0.335356176522162,
2083
+ "learning_rate": 1.631726903369238e-05,
2084
+ "loss": 0.3438,
2085
+ "loss_nan_ranks": 0,
2086
+ "loss_rank_avg": 0.20420154929161072,
2087
+ "step": 945,
2088
+ "valid_targets_mean": 4534.9,
2089
+ "valid_targets_min": 1332
2090
+ },
2091
+ {
2092
+ "epoch": 3.0352,
2093
+ "grad_norm": 0.4094788051075967,
2094
+ "learning_rate": 1.609819355967744e-05,
2095
+ "loss": 0.3595,
2096
+ "loss_nan_ranks": 0,
2097
+ "loss_rank_avg": 0.21997550129890442,
2098
+ "step": 950,
2099
+ "valid_targets_mean": 5079.9,
2100
+ "valid_targets_min": 899
2101
+ },
2102
+ {
2103
+ "epoch": 3.0512,
2104
+ "grad_norm": 0.3776128013373243,
2105
+ "learning_rate": 1.587960370465239e-05,
2106
+ "loss": 0.3418,
2107
+ "loss_nan_ranks": 0,
2108
+ "loss_rank_avg": 0.1931406706571579,
2109
+ "step": 955,
2110
+ "valid_targets_mean": 4443.5,
2111
+ "valid_targets_min": 942
2112
+ },
2113
+ {
2114
+ "epoch": 3.0672,
2115
+ "grad_norm": 0.3360958142209563,
2116
+ "learning_rate": 1.5661526674319582e-05,
2117
+ "loss": 0.3244,
2118
+ "loss_nan_ranks": 0,
2119
+ "loss_rank_avg": 0.13528843224048615,
2120
+ "step": 960,
2121
+ "valid_targets_mean": 4246.4,
2122
+ "valid_targets_min": 829
2123
+ },
2124
+ {
2125
+ "epoch": 3.0832,
2126
+ "grad_norm": 0.4239322119841672,
2127
+ "learning_rate": 1.544398961055516e-05,
2128
+ "loss": 0.3404,
2129
+ "loss_nan_ranks": 0,
2130
+ "loss_rank_avg": 0.1699754148721695,
2131
+ "step": 965,
2132
+ "valid_targets_mean": 2840.8,
2133
+ "valid_targets_min": 784
2134
+ },
2135
+ {
2136
+ "epoch": 3.0992,
2137
+ "grad_norm": 0.3993307352764048,
2138
+ "learning_rate": 1.5227019588031035e-05,
2139
+ "loss": 0.3537,
2140
+ "loss_nan_ranks": 0,
2141
+ "loss_rank_avg": 0.14093229174613953,
2142
+ "step": 970,
2143
+ "valid_targets_mean": 2824.7,
2144
+ "valid_targets_min": 512
2145
+ },
2146
+ {
2147
+ "epoch": 3.1152,
2148
+ "grad_norm": 0.44328943732647597,
2149
+ "learning_rate": 1.501064361084511e-05,
2150
+ "loss": 0.3441,
2151
+ "loss_nan_ranks": 0,
2152
+ "loss_rank_avg": 0.20833003520965576,
2153
+ "step": 975,
2154
+ "valid_targets_mean": 3540.4,
2155
+ "valid_targets_min": 676
2156
+ },
2157
+ {
2158
+ "epoch": 3.1312,
2159
+ "grad_norm": 0.3454439076872715,
2160
+ "learning_rate": 1.47948886091604e-05,
2161
+ "loss": 0.3388,
2162
+ "loss_nan_ranks": 0,
2163
+ "loss_rank_avg": 0.13593928515911102,
2164
+ "step": 980,
2165
+ "valid_targets_mean": 4455.2,
2166
+ "valid_targets_min": 1037
2167
+ },
2168
+ {
2169
+ "epoch": 3.1471999999999998,
2170
+ "grad_norm": 0.38888253914726173,
2171
+ "learning_rate": 1.4579781435853289e-05,
2172
+ "loss": 0.3407,
2173
+ "loss_nan_ranks": 0,
2174
+ "loss_rank_avg": 0.2271447479724884,
2175
+ "step": 985,
2176
+ "valid_targets_mean": 4185.3,
2177
+ "valid_targets_min": 865
2178
+ },
2179
+ {
2180
+ "epoch": 3.1632,
2181
+ "grad_norm": 0.34563917020171253,
2182
+ "learning_rate": 1.4365348863171406e-05,
2183
+ "loss": 0.3485,
2184
+ "loss_nan_ranks": 0,
2185
+ "loss_rank_avg": 0.09505438804626465,
2186
+ "step": 990,
2187
+ "valid_targets_mean": 3164.1,
2188
+ "valid_targets_min": 691
2189
+ },
2190
+ {
2191
+ "epoch": 3.1792,
2192
+ "grad_norm": 0.31352592534427964,
2193
+ "learning_rate": 1.4151617579401551e-05,
2194
+ "loss": 0.3483,
2195
+ "loss_nan_ranks": 0,
2196
+ "loss_rank_avg": 0.14270518720149994,
2197
+ "step": 995,
2198
+ "valid_targets_mean": 4691.3,
2199
+ "valid_targets_min": 672
2200
+ },
2201
+ {
2202
+ "epoch": 3.1952,
2203
+ "grad_norm": 0.43447105744832676,
2204
+ "learning_rate": 1.3938614185548094e-05,
2205
+ "loss": 0.3432,
2206
+ "loss_nan_ranks": 0,
2207
+ "loss_rank_avg": 0.13177445530891418,
2208
+ "step": 1000,
2209
+ "valid_targets_mean": 2326.1,
2210
+ "valid_targets_min": 672
2211
+ },
2212
+ {
2213
+ "epoch": 3.2112,
2214
+ "grad_norm": 0.3563119894321619,
2215
+ "learning_rate": 1.3726365192022173e-05,
2216
+ "loss": 0.3404,
2217
+ "loss_nan_ranks": 0,
2218
+ "loss_rank_avg": 0.1878964751958847,
2219
+ "step": 1005,
2220
+ "valid_targets_mean": 4940.8,
2221
+ "valid_targets_min": 762
2222
+ },
2223
+ {
2224
+ "epoch": 3.2272,
2225
+ "grad_norm": 0.35363752093967743,
2226
+ "learning_rate": 1.3514897015342257e-05,
2227
+ "loss": 0.3253,
2228
+ "loss_nan_ranks": 0,
2229
+ "loss_rank_avg": 0.13989496231079102,
2230
+ "step": 1010,
2231
+ "valid_targets_mean": 4354.2,
2232
+ "valid_targets_min": 523
2233
+ },
2234
+ {
2235
+ "epoch": 3.2432,
2236
+ "grad_norm": 0.4319224916079313,
2237
+ "learning_rate": 1.3304235974846295e-05,
2238
+ "loss": 0.3299,
2239
+ "loss_nan_ranks": 0,
2240
+ "loss_rank_avg": 0.1998392939567566,
2241
+ "step": 1015,
2242
+ "valid_targets_mean": 3906.0,
2243
+ "valid_targets_min": 679
2244
+ },
2245
+ {
2246
+ "epoch": 3.2592,
2247
+ "grad_norm": 0.3495855473672183,
2248
+ "learning_rate": 1.3094408289416052e-05,
2249
+ "loss": 0.3475,
2250
+ "loss_nan_ranks": 0,
2251
+ "loss_rank_avg": 0.20489974319934845,
2252
+ "step": 1020,
2253
+ "valid_targets_mean": 5238.0,
2254
+ "valid_targets_min": 668
2255
+ },
2256
+ {
2257
+ "epoch": 3.2752,
2258
+ "grad_norm": 0.4060553154313819,
2259
+ "learning_rate": 1.2885440074213877e-05,
2260
+ "loss": 0.3325,
2261
+ "loss_nan_ranks": 0,
2262
+ "loss_rank_avg": 0.22139576077461243,
2263
+ "step": 1025,
2264
+ "valid_targets_mean": 4605.4,
2265
+ "valid_targets_min": 796
2266
+ },
2267
+ {
2268
+ "epoch": 3.2912,
2269
+ "grad_norm": 0.38852642508297336,
2270
+ "learning_rate": 1.267735733743242e-05,
2271
+ "loss": 0.3319,
2272
+ "loss_nan_ranks": 0,
2273
+ "loss_rank_avg": 0.19194181263446808,
2274
+ "step": 1030,
2275
+ "valid_targets_mean": 4457.9,
2276
+ "valid_targets_min": 1327
2277
+ },
2278
+ {
2279
+ "epoch": 3.3072,
2280
+ "grad_norm": 0.36166900656776446,
2281
+ "learning_rate": 1.2470185977057643e-05,
2282
+ "loss": 0.3286,
2283
+ "loss_nan_ranks": 0,
2284
+ "loss_rank_avg": 0.1793379783630371,
2285
+ "step": 1035,
2286
+ "valid_targets_mean": 4544.1,
2287
+ "valid_targets_min": 1022
2288
+ },
2289
+ {
2290
+ "epoch": 3.3232,
2291
+ "grad_norm": 0.36293178811713217,
2292
+ "learning_rate": 1.2263951777645588e-05,
2293
+ "loss": 0.3282,
2294
+ "loss_nan_ranks": 0,
2295
+ "loss_rank_avg": 0.10857829451560974,
2296
+ "step": 1040,
2297
+ "valid_targets_mean": 1993.7,
2298
+ "valid_targets_min": 911
2299
+ },
2300
+ {
2301
+ "epoch": 3.3392,
2302
+ "grad_norm": 0.5084148493340929,
2303
+ "learning_rate": 1.2058680407113176e-05,
2304
+ "loss": 0.3532,
2305
+ "loss_nan_ranks": 0,
2306
+ "loss_rank_avg": 0.1812812238931656,
2307
+ "step": 1045,
2308
+ "valid_targets_mean": 2746.2,
2309
+ "valid_targets_min": 667
2310
+ },
2311
+ {
2312
+ "epoch": 3.3552,
2313
+ "grad_norm": 0.3430189590754264,
2314
+ "learning_rate": 1.1854397413543626e-05,
2315
+ "loss": 0.3523,
2316
+ "loss_nan_ranks": 0,
2317
+ "loss_rank_avg": 0.13636204600334167,
2318
+ "step": 1050,
2319
+ "valid_targets_mean": 4036.2,
2320
+ "valid_targets_min": 811
2321
+ },
2322
+ {
2323
+ "epoch": 3.3712,
2324
+ "grad_norm": 0.331734814176711,
2325
+ "learning_rate": 1.1651128222006713e-05,
2326
+ "loss": 0.3327,
2327
+ "loss_nan_ranks": 0,
2328
+ "loss_rank_avg": 0.15516330301761627,
2329
+ "step": 1055,
2330
+ "valid_targets_mean": 4857.7,
2331
+ "valid_targets_min": 729
2332
+ },
2333
+ {
2334
+ "epoch": 3.3872,
2335
+ "grad_norm": 0.3484206028478157,
2336
+ "learning_rate": 1.1448898131394364e-05,
2337
+ "loss": 0.3261,
2338
+ "loss_nan_ranks": 0,
2339
+ "loss_rank_avg": 0.153816819190979,
2340
+ "step": 1060,
2341
+ "valid_targets_mean": 4114.8,
2342
+ "valid_targets_min": 569
2343
+ },
2344
+ {
2345
+ "epoch": 3.4032,
2346
+ "grad_norm": 0.3944658789774959,
2347
+ "learning_rate": 1.124773231127196e-05,
2348
+ "loss": 0.36,
2349
+ "loss_nan_ranks": 0,
2350
+ "loss_rank_avg": 0.16333676874637604,
2351
+ "step": 1065,
2352
+ "valid_targets_mean": 4517.0,
2353
+ "valid_targets_min": 933
2354
+ },
2355
+ {
2356
+ "epoch": 3.4192,
2357
+ "grad_norm": 0.44319906945749254,
2358
+ "learning_rate": 1.1047655798745752e-05,
2359
+ "loss": 0.3546,
2360
+ "loss_nan_ranks": 0,
2361
+ "loss_rank_avg": 0.1139206737279892,
2362
+ "step": 1070,
2363
+ "valid_targets_mean": 2351.4,
2364
+ "valid_targets_min": 654
2365
+ },
2366
+ {
2367
+ "epoch": 3.4352,
2368
+ "grad_norm": 0.3562865395677016,
2369
+ "learning_rate": 1.084869349534671e-05,
2370
+ "loss": 0.3428,
2371
+ "loss_nan_ranks": 0,
2372
+ "loss_rank_avg": 0.19184201955795288,
2373
+ "step": 1075,
2374
+ "valid_targets_mean": 4532.2,
2375
+ "valid_targets_min": 722
2376
+ },
2377
+ {
2378
+ "epoch": 3.4512,
2379
+ "grad_norm": 0.3779331561005202,
2380
+ "learning_rate": 1.0650870163931275e-05,
2381
+ "loss": 0.358,
2382
+ "loss_nan_ranks": 0,
2383
+ "loss_rank_avg": 0.15721634030342102,
2384
+ "step": 1080,
2385
+ "valid_targets_mean": 3805.3,
2386
+ "valid_targets_min": 964
2387
+ },
2388
+ {
2389
+ "epoch": 3.4672,
2390
+ "grad_norm": 0.3606978223574716,
2391
+ "learning_rate": 1.0454210425599426e-05,
2392
+ "loss": 0.3508,
2393
+ "loss_nan_ranks": 0,
2394
+ "loss_rank_avg": 0.16364584863185883,
2395
+ "step": 1085,
2396
+ "valid_targets_mean": 3938.2,
2397
+ "valid_targets_min": 697
2398
+ },
2399
+ {
2400
+ "epoch": 3.4832,
2401
+ "grad_norm": 0.37686671437971875,
2402
+ "learning_rate": 1.0258738756630255e-05,
2403
+ "loss": 0.3474,
2404
+ "loss_nan_ranks": 0,
2405
+ "loss_rank_avg": 0.17473503947257996,
2406
+ "step": 1090,
2407
+ "valid_targets_mean": 3853.2,
2408
+ "valid_targets_min": 639
2409
+ },
2410
+ {
2411
+ "epoch": 3.4992,
2412
+ "grad_norm": 0.4429048555200298,
2413
+ "learning_rate": 1.0064479485435737e-05,
2414
+ "loss": 0.3574,
2415
+ "loss_nan_ranks": 0,
2416
+ "loss_rank_avg": 0.15065409243106842,
2417
+ "step": 1095,
2418
+ "valid_targets_mean": 2704.8,
2419
+ "valid_targets_min": 762
2420
+ },
2421
+ {
2422
+ "epoch": 3.5152,
2423
+ "grad_norm": 0.39157782240601363,
2424
+ "learning_rate": 9.871456789532736e-06,
2425
+ "loss": 0.3448,
2426
+ "loss_nan_ranks": 0,
2427
+ "loss_rank_avg": 0.170841783285141,
2428
+ "step": 1100,
2429
+ "valid_targets_mean": 3648.5,
2430
+ "valid_targets_min": 714
2431
+ },
2432
+ {
2433
+ "epoch": 3.5312,
2434
+ "grad_norm": 0.4032833854947422,
2435
+ "learning_rate": 9.679694692533909e-06,
2436
+ "loss": 0.3401,
2437
+ "loss_nan_ranks": 0,
2438
+ "loss_rank_avg": 0.17664435505867004,
2439
+ "step": 1105,
2440
+ "valid_targets_mean": 3603.8,
2441
+ "valid_targets_min": 426
2442
+ },
2443
+ {
2444
+ "epoch": 3.5472,
2445
+ "grad_norm": 0.3759820875959348,
2446
+ "learning_rate": 9.489217061157744e-06,
2447
+ "loss": 0.3525,
2448
+ "loss_nan_ranks": 0,
2449
+ "loss_rank_avg": 0.19742423295974731,
2450
+ "step": 1110,
2451
+ "valid_targets_mean": 3592.0,
2452
+ "valid_targets_min": 958
2453
+ },
2454
+ {
2455
+ "epoch": 3.5632,
2456
+ "grad_norm": 0.44117131213661204,
2457
+ "learning_rate": 9.30004760225806e-06,
2458
+ "loss": 0.3412,
2459
+ "loss_nan_ranks": 0,
2460
+ "loss_rank_avg": 0.13590778410434723,
2461
+ "step": 1115,
2462
+ "valid_targets_mean": 2444.6,
2463
+ "valid_targets_min": 490
2464
+ },
2465
+ {
2466
+ "epoch": 3.5792,
2467
+ "grad_norm": 0.35876633764128313,
2468
+ "learning_rate": 9.112209859873479e-06,
2469
+ "loss": 0.3414,
2470
+ "loss_nan_ranks": 0,
2471
+ "loss_rank_avg": 0.15319585800170898,
2472
+ "step": 1120,
2473
+ "valid_targets_mean": 3815.9,
2474
+ "valid_targets_min": 704
2475
+ },
2476
+ {
2477
+ "epoch": 3.5952,
2478
+ "grad_norm": 0.34990276526484193,
2479
+ "learning_rate": 8.925727212297154e-06,
2480
+ "loss": 0.3407,
2481
+ "loss_nan_ranks": 0,
2482
+ "loss_rank_avg": 0.11832721531391144,
2483
+ "step": 1125,
2484
+ "valid_targets_mean": 3515.6,
2485
+ "valid_targets_min": 796
2486
+ },
2487
+ {
2488
+ "epoch": 3.6112,
2489
+ "grad_norm": 0.4211378285934783,
2490
+ "learning_rate": 8.74062286916705e-06,
2491
+ "loss": 0.3524,
2492
+ "loss_nan_ranks": 0,
2493
+ "loss_rank_avg": 0.19017146527767181,
2494
+ "step": 1130,
2495
+ "valid_targets_mean": 3507.1,
2496
+ "valid_targets_min": 685
2497
+ },
2498
+ {
2499
+ "epoch": 3.6272,
2500
+ "grad_norm": 0.522703439122451,
2501
+ "learning_rate": 8.55691986857733e-06,
2502
+ "loss": 0.3489,
2503
+ "loss_nan_ranks": 0,
2504
+ "loss_rank_avg": 0.18660058081150055,
2505
+ "step": 1135,
2506
+ "valid_targets_mean": 2299.6,
2507
+ "valid_targets_min": 660
2508
+ },
2509
+ {
2510
+ "epoch": 3.6432,
2511
+ "grad_norm": 0.3408305954246411,
2512
+ "learning_rate": 8.374641074210979e-06,
2513
+ "loss": 0.3171,
2514
+ "loss_nan_ranks": 0,
2515
+ "loss_rank_avg": 0.18012124300003052,
2516
+ "step": 1140,
2517
+ "valid_targets_mean": 5196.1,
2518
+ "valid_targets_min": 713
2519
+ },
2520
+ {
2521
+ "epoch": 3.6592000000000002,
2522
+ "grad_norm": 0.3788128493752177,
2523
+ "learning_rate": 8.193809172494249e-06,
2524
+ "loss": 0.3547,
2525
+ "loss_nan_ranks": 0,
2526
+ "loss_rank_avg": 0.1889040768146515,
2527
+ "step": 1145,
2528
+ "valid_targets_mean": 3868.0,
2529
+ "valid_targets_min": 563
2530
+ },
2531
+ {
2532
+ "epoch": 3.6752000000000002,
2533
+ "grad_norm": 0.3195327070645042,
2534
+ "learning_rate": 8.014446669773061e-06,
2535
+ "loss": 0.3508,
2536
+ "loss_nan_ranks": 0,
2537
+ "loss_rank_avg": 0.17038068175315857,
2538
+ "step": 1150,
2539
+ "valid_targets_mean": 5676.6,
2540
+ "valid_targets_min": 1112
2541
+ },
2542
+ {
2543
+ "epoch": 3.6912000000000003,
2544
+ "grad_norm": 0.3292140039943816,
2545
+ "learning_rate": 7.83657588951187e-06,
2546
+ "loss": 0.3547,
2547
+ "loss_nan_ranks": 0,
2548
+ "loss_rank_avg": 0.18771226704120636,
2549
+ "step": 1155,
2550
+ "valid_targets_mean": 6315.8,
2551
+ "valid_targets_min": 702
2552
+ },
2553
+ {
2554
+ "epoch": 3.7072000000000003,
2555
+ "grad_norm": 0.36796060727007385,
2556
+ "learning_rate": 7.66021896951529e-06,
2557
+ "loss": 0.314,
2558
+ "loss_nan_ranks": 0,
2559
+ "loss_rank_avg": 0.13613468408584595,
2560
+ "step": 1160,
2561
+ "valid_targets_mean": 4006.8,
2562
+ "valid_targets_min": 777
2563
+ },
2564
+ {
2565
+ "epoch": 3.7232,
2566
+ "grad_norm": 0.4094060697982775,
2567
+ "learning_rate": 7.485397859172841e-06,
2568
+ "loss": 0.3557,
2569
+ "loss_nan_ranks": 0,
2570
+ "loss_rank_avg": 0.16176709532737732,
2571
+ "step": 1165,
2572
+ "valid_targets_mean": 3108.9,
2573
+ "valid_targets_min": 922
2574
+ },
2575
+ {
2576
+ "epoch": 3.7392,
2577
+ "grad_norm": 0.34905958959203853,
2578
+ "learning_rate": 7.312134316727093e-06,
2579
+ "loss": 0.3372,
2580
+ "loss_nan_ranks": 0,
2581
+ "loss_rank_avg": 0.16702544689178467,
2582
+ "step": 1170,
2583
+ "valid_targets_mean": 4760.9,
2584
+ "valid_targets_min": 1183
2585
+ },
2586
+ {
2587
+ "epoch": 3.7552,
2588
+ "grad_norm": 0.35661452032445096,
2589
+ "learning_rate": 7.140449906565656e-06,
2590
+ "loss": 0.3375,
2591
+ "loss_nan_ranks": 0,
2592
+ "loss_rank_avg": 0.15472249686717987,
2593
+ "step": 1175,
2594
+ "valid_targets_mean": 3650.1,
2595
+ "valid_targets_min": 656
2596
+ },
2597
+ {
2598
+ "epoch": 3.7712,
2599
+ "grad_norm": 0.37358322798596966,
2600
+ "learning_rate": 6.970365996537285e-06,
2601
+ "loss": 0.3413,
2602
+ "loss_nan_ranks": 0,
2603
+ "loss_rank_avg": 0.16219046711921692,
2604
+ "step": 1180,
2605
+ "valid_targets_mean": 4001.8,
2606
+ "valid_targets_min": 862
2607
+ },
2608
+ {
2609
+ "epoch": 3.7872,
2610
+ "grad_norm": 0.3749806820714067,
2611
+ "learning_rate": 6.801903755292403e-06,
2612
+ "loss": 0.3533,
2613
+ "loss_nan_ranks": 0,
2614
+ "loss_rank_avg": 0.1868247240781784,
2615
+ "step": 1185,
2616
+ "valid_targets_mean": 4775.1,
2617
+ "valid_targets_min": 578
2618
+ },
2619
+ {
2620
+ "epoch": 3.8032,
2621
+ "grad_norm": 0.38429670420945505,
2622
+ "learning_rate": 6.635084149648481e-06,
2623
+ "loss": 0.3661,
2624
+ "loss_nan_ranks": 0,
2625
+ "loss_rank_avg": 0.12616121768951416,
2626
+ "step": 1190,
2627
+ "valid_targets_mean": 2768.1,
2628
+ "valid_targets_min": 563
2629
+ },
2630
+ {
2631
+ "epoch": 3.8192,
2632
+ "grad_norm": 0.36030429317287976,
2633
+ "learning_rate": 6.469927941980483e-06,
2634
+ "loss": 0.346,
2635
+ "loss_nan_ranks": 0,
2636
+ "loss_rank_avg": 0.186874657869339,
2637
+ "step": 1195,
2638
+ "valid_targets_mean": 4400.2,
2639
+ "valid_targets_min": 843
2640
+ },
2641
+ {
2642
+ "epoch": 3.8352,
2643
+ "grad_norm": 0.35202133068417446,
2644
+ "learning_rate": 6.30645568763681e-06,
2645
+ "loss": 0.339,
2646
+ "loss_nan_ranks": 0,
2647
+ "loss_rank_avg": 0.18812918663024902,
2648
+ "step": 1200,
2649
+ "valid_targets_mean": 4493.2,
2650
+ "valid_targets_min": 730
2651
+ },
2652
+ {
2653
+ "epoch": 3.8512,
2654
+ "grad_norm": 0.32683827554231104,
2655
+ "learning_rate": 6.144687732380963e-06,
2656
+ "loss": 0.3287,
2657
+ "loss_nan_ranks": 0,
2658
+ "loss_rank_avg": 0.14614273607730865,
2659
+ "step": 1205,
2660
+ "valid_targets_mean": 4499.8,
2661
+ "valid_targets_min": 820
2662
+ },
2663
+ {
2664
+ "epoch": 3.8672,
2665
+ "grad_norm": 0.41211985482110514,
2666
+ "learning_rate": 5.9846442098592895e-06,
2667
+ "loss": 0.3321,
2668
+ "loss_nan_ranks": 0,
2669
+ "loss_rank_avg": 0.1370638906955719,
2670
+ "step": 1210,
2671
+ "valid_targets_mean": 2420.4,
2672
+ "valid_targets_min": 619
2673
+ },
2674
+ {
2675
+ "epoch": 3.8832,
2676
+ "grad_norm": 0.34359938713276317,
2677
+ "learning_rate": 5.826345039095178e-06,
2678
+ "loss": 0.3399,
2679
+ "loss_nan_ranks": 0,
2680
+ "loss_rank_avg": 0.16373825073242188,
2681
+ "step": 1215,
2682
+ "valid_targets_mean": 4538.1,
2683
+ "valid_targets_min": 695
2684
+ },
2685
+ {
2686
+ "epoch": 3.8992,
2687
+ "grad_norm": 0.35681929563367065,
2688
+ "learning_rate": 5.669809922009937e-06,
2689
+ "loss": 0.3239,
2690
+ "loss_nan_ranks": 0,
2691
+ "loss_rank_avg": 0.1745469570159912,
2692
+ "step": 1220,
2693
+ "valid_targets_mean": 4123.0,
2694
+ "valid_targets_min": 960
2695
+ },
2696
+ {
2697
+ "epoch": 3.9152,
2698
+ "grad_norm": 0.3230534112138346,
2699
+ "learning_rate": 5.515058340970665e-06,
2700
+ "loss": 0.3512,
2701
+ "loss_nan_ranks": 0,
2702
+ "loss_rank_avg": 0.16375651955604553,
2703
+ "step": 1225,
2704
+ "valid_targets_mean": 4488.4,
2705
+ "valid_targets_min": 1016
2706
+ },
2707
+ {
2708
+ "epoch": 3.9312,
2709
+ "grad_norm": 0.3788271465767256,
2710
+ "learning_rate": 5.362109556365496e-06,
2711
+ "loss": 0.3407,
2712
+ "loss_nan_ranks": 0,
2713
+ "loss_rank_avg": 0.140263631939888,
2714
+ "step": 1230,
2715
+ "valid_targets_mean": 3269.2,
2716
+ "valid_targets_min": 752
2717
+ },
2718
+ {
2719
+ "epoch": 3.9472,
2720
+ "grad_norm": 0.3908033970184151,
2721
+ "learning_rate": 5.2109826042064445e-06,
2722
+ "loss": 0.3464,
2723
+ "loss_nan_ranks": 0,
2724
+ "loss_rank_avg": 0.15427762269973755,
2725
+ "step": 1235,
2726
+ "valid_targets_mean": 2903.1,
2727
+ "valid_targets_min": 687
2728
+ },
2729
+ {
2730
+ "epoch": 3.9632,
2731
+ "grad_norm": 0.31824838313855275,
2732
+ "learning_rate": 5.0616962937601945e-06,
2733
+ "loss": 0.3375,
2734
+ "loss_nan_ranks": 0,
2735
+ "loss_rank_avg": 0.17652173340320587,
2736
+ "step": 1240,
2737
+ "valid_targets_mean": 6038.0,
2738
+ "valid_targets_min": 711
2739
+ },
2740
+ {
2741
+ "epoch": 3.9792,
2742
+ "grad_norm": 0.41835790896429187,
2743
+ "learning_rate": 4.914269205207076e-06,
2744
+ "loss": 0.34,
2745
+ "loss_nan_ranks": 0,
2746
+ "loss_rank_avg": 0.16578412055969238,
2747
+ "step": 1245,
2748
+ "valid_targets_mean": 3098.8,
2749
+ "valid_targets_min": 1084
2750
+ },
2751
+ {
2752
+ "epoch": 3.9952,
2753
+ "grad_norm": 0.3239771649167043,
2754
+ "learning_rate": 4.76871968732858e-06,
2755
+ "loss": 0.3606,
2756
+ "loss_nan_ranks": 0,
2757
+ "loss_rank_avg": 0.15589842200279236,
2758
+ "step": 1250,
2759
+ "valid_targets_mean": 5305.8,
2760
+ "valid_targets_min": 1187
2761
+ },
2762
+ {
2763
+ "epoch": 4.0096,
2764
+ "grad_norm": 0.35238292143174227,
2765
+ "learning_rate": 4.625065855223689e-06,
2766
+ "loss": 0.3435,
2767
+ "loss_nan_ranks": 0,
2768
+ "loss_rank_avg": 0.1814814805984497,
2769
+ "step": 1255,
2770
+ "valid_targets_mean": 4887.1,
2771
+ "valid_targets_min": 847
2772
+ },
2773
+ {
2774
+ "epoch": 4.0256,
2775
+ "grad_norm": 0.45008459085343927,
2776
+ "learning_rate": 4.483325588054259e-06,
2777
+ "loss": 0.3363,
2778
+ "loss_nan_ranks": 0,
2779
+ "loss_rank_avg": 0.17144045233726501,
2780
+ "step": 1260,
2781
+ "valid_targets_mean": 3204.2,
2782
+ "valid_targets_min": 489
2783
+ },
2784
+ {
2785
+ "epoch": 4.0416,
2786
+ "grad_norm": 0.38402873507610885,
2787
+ "learning_rate": 4.343516526819755e-06,
2788
+ "loss": 0.3283,
2789
+ "loss_nan_ranks": 0,
2790
+ "loss_rank_avg": 0.16901005804538727,
2791
+ "step": 1265,
2792
+ "valid_targets_mean": 3899.9,
2793
+ "valid_targets_min": 731
2794
+ },
2795
+ {
2796
+ "epoch": 4.0576,
2797
+ "grad_norm": 0.3788020921136641,
2798
+ "learning_rate": 4.205656072161681e-06,
2799
+ "loss": 0.315,
2800
+ "loss_nan_ranks": 0,
2801
+ "loss_rank_avg": 0.2033902406692505,
2802
+ "step": 1270,
2803
+ "valid_targets_mean": 5402.2,
2804
+ "valid_targets_min": 767
2805
+ },
2806
+ {
2807
+ "epoch": 4.0736,
2808
+ "grad_norm": 0.3924558007635478,
2809
+ "learning_rate": 4.069761382197901e-06,
2810
+ "loss": 0.3473,
2811
+ "loss_nan_ranks": 0,
2812
+ "loss_rank_avg": 0.14622555673122406,
2813
+ "step": 1275,
2814
+ "valid_targets_mean": 3266.4,
2815
+ "valid_targets_min": 472
2816
+ },
2817
+ {
2818
+ "epoch": 4.0896,
2819
+ "grad_norm": 0.30034356754743674,
2820
+ "learning_rate": 3.935849370387104e-06,
2821
+ "loss": 0.3218,
2822
+ "loss_nan_ranks": 0,
2823
+ "loss_rank_avg": 0.15228131413459778,
2824
+ "step": 1280,
2825
+ "valid_targets_mean": 5391.9,
2826
+ "valid_targets_min": 847
2827
+ },
2828
+ {
2829
+ "epoch": 4.1056,
2830
+ "grad_norm": 0.3250917202336007,
2831
+ "learning_rate": 3.803936703423783e-06,
2832
+ "loss": 0.3253,
2833
+ "loss_nan_ranks": 0,
2834
+ "loss_rank_avg": 0.14912036061286926,
2835
+ "step": 1285,
2836
+ "valid_targets_mean": 5266.2,
2837
+ "valid_targets_min": 914
2838
+ },
2839
+ {
2840
+ "epoch": 4.1216,
2841
+ "grad_norm": 0.31424788351984845,
2842
+ "learning_rate": 3.6740397991638864e-06,
2843
+ "loss": 0.3152,
2844
+ "loss_nan_ranks": 0,
2845
+ "loss_rank_avg": 0.14228126406669617,
2846
+ "step": 1290,
2847
+ "valid_targets_mean": 4604.3,
2848
+ "valid_targets_min": 936
2849
+ },
2850
+ {
2851
+ "epoch": 4.1376,
2852
+ "grad_norm": 0.36643651361028096,
2853
+ "learning_rate": 3.5461748245814633e-06,
2854
+ "loss": 0.3371,
2855
+ "loss_nan_ranks": 0,
2856
+ "loss_rank_avg": 0.1840534508228302,
2857
+ "step": 1295,
2858
+ "valid_targets_mean": 4418.9,
2859
+ "valid_targets_min": 713
2860
+ },
2861
+ {
2862
+ "epoch": 4.1536,
2863
+ "grad_norm": 0.4546431971157884,
2864
+ "learning_rate": 3.420357693756502e-06,
2865
+ "loss": 0.3471,
2866
+ "loss_nan_ranks": 0,
2867
+ "loss_rank_avg": 0.17221775650978088,
2868
+ "step": 1300,
2869
+ "valid_targets_mean": 3512.8,
2870
+ "valid_targets_min": 728
2871
+ },
2872
+ {
2873
+ "epoch": 4.1696,
2874
+ "grad_norm": 0.3805593027549495,
2875
+ "learning_rate": 3.2966040658942666e-06,
2876
+ "loss": 0.3371,
2877
+ "loss_nan_ranks": 0,
2878
+ "loss_rank_avg": 0.1598142385482788,
2879
+ "step": 1305,
2880
+ "valid_targets_mean": 3979.7,
2881
+ "valid_targets_min": 857
2882
+ },
2883
+ {
2884
+ "epoch": 4.1856,
2885
+ "grad_norm": 0.3335980469652557,
2886
+ "learning_rate": 3.174929343376374e-06,
2887
+ "loss": 0.3536,
2888
+ "loss_nan_ranks": 0,
2889
+ "loss_rank_avg": 0.20903745293617249,
2890
+ "step": 1310,
2891
+ "valid_targets_mean": 6657.6,
2892
+ "valid_targets_min": 569
2893
+ },
2894
+ {
2895
+ "epoch": 4.2016,
2896
+ "grad_norm": 0.3604800331212591,
2897
+ "learning_rate": 3.055348669843794e-06,
2898
+ "loss": 0.3187,
2899
+ "loss_nan_ranks": 0,
2900
+ "loss_rank_avg": 0.11375663429498672,
2901
+ "step": 1315,
2902
+ "valid_targets_mean": 3103.1,
2903
+ "valid_targets_min": 448
2904
+ },
2905
+ {
2906
+ "epoch": 4.2176,
2907
+ "grad_norm": 0.28671003001950635,
2908
+ "learning_rate": 2.937876928312062e-06,
2909
+ "loss": 0.3431,
2910
+ "loss_nan_ranks": 0,
2911
+ "loss_rank_avg": 0.11587075144052505,
2912
+ "step": 1320,
2913
+ "valid_targets_mean": 4968.6,
2914
+ "valid_targets_min": 861
2915
+ },
2916
+ {
2917
+ "epoch": 4.2336,
2918
+ "grad_norm": 0.4017656623883969,
2919
+ "learning_rate": 2.8225287393189547e-06,
2920
+ "loss": 0.3269,
2921
+ "loss_nan_ranks": 0,
2922
+ "loss_rank_avg": 0.17351335287094116,
2923
+ "step": 1325,
2924
+ "valid_targets_mean": 3881.8,
2925
+ "valid_targets_min": 1049
2926
+ },
2927
+ {
2928
+ "epoch": 4.2496,
2929
+ "grad_norm": 0.373357304461226,
2930
+ "learning_rate": 2.709318459104815e-06,
2931
+ "loss": 0.3235,
2932
+ "loss_nan_ranks": 0,
2933
+ "loss_rank_avg": 0.174965038895607,
2934
+ "step": 1330,
2935
+ "valid_targets_mean": 4631.8,
2936
+ "valid_targets_min": 535
2937
+ },
2938
+ {
2939
+ "epoch": 4.2656,
2940
+ "grad_norm": 0.4122717520251167,
2941
+ "learning_rate": 2.5982601778257733e-06,
2942
+ "loss": 0.3388,
2943
+ "loss_nan_ranks": 0,
2944
+ "loss_rank_avg": 0.2125864326953888,
2945
+ "step": 1335,
2946
+ "valid_targets_mean": 3899.5,
2947
+ "valid_targets_min": 950
2948
+ },
2949
+ {
2950
+ "epoch": 4.2816,
2951
+ "grad_norm": 0.3495807169302053,
2952
+ "learning_rate": 2.4893677178000797e-06,
2953
+ "loss": 0.3223,
2954
+ "loss_nan_ranks": 0,
2955
+ "loss_rank_avg": 0.21441903710365295,
2956
+ "step": 1340,
2957
+ "valid_targets_mean": 5717.9,
2958
+ "valid_targets_min": 711
2959
+ },
2960
+ {
2961
+ "epoch": 4.2976,
2962
+ "grad_norm": 0.4388515044874441,
2963
+ "learning_rate": 2.3826546317877795e-06,
2964
+ "loss": 0.3761,
2965
+ "loss_nan_ranks": 0,
2966
+ "loss_rank_avg": 0.1932186335325241,
2967
+ "step": 1345,
2968
+ "valid_targets_mean": 3286.0,
2969
+ "valid_targets_min": 567
2970
+ },
2971
+ {
2972
+ "epoch": 4.3136,
2973
+ "grad_norm": 0.34611942555321196,
2974
+ "learning_rate": 2.278134201303952e-06,
2975
+ "loss": 0.3265,
2976
+ "loss_nan_ranks": 0,
2977
+ "loss_rank_avg": 0.21352379024028778,
2978
+ "step": 1350,
2979
+ "valid_targets_mean": 5044.2,
2980
+ "valid_targets_min": 558
2981
+ },
2982
+ {
2983
+ "epoch": 4.3296,
2984
+ "grad_norm": 0.3551898027835839,
2985
+ "learning_rate": 2.1758194349656624e-06,
2986
+ "loss": 0.357,
2987
+ "loss_nan_ranks": 0,
2988
+ "loss_rank_avg": 0.16915194690227509,
2989
+ "step": 1355,
2990
+ "valid_targets_mean": 4792.9,
2991
+ "valid_targets_min": 655
2992
+ },
2993
+ {
2994
+ "epoch": 4.3456,
2995
+ "grad_norm": 0.39252254953522203,
2996
+ "learning_rate": 2.075723066872939e-06,
2997
+ "loss": 0.3279,
2998
+ "loss_nan_ranks": 0,
2999
+ "loss_rank_avg": 0.13038407266139984,
3000
+ "step": 1360,
3001
+ "valid_targets_mean": 3598.0,
3002
+ "valid_targets_min": 729
3003
+ },
3004
+ {
3005
+ "epoch": 4.3616,
3006
+ "grad_norm": 0.33078484844023676,
3007
+ "learning_rate": 1.977857555023854e-06,
3008
+ "loss": 0.3422,
3009
+ "loss_nan_ranks": 0,
3010
+ "loss_rank_avg": 0.09628881514072418,
3011
+ "step": 1365,
3012
+ "valid_targets_mean": 3414.8,
3013
+ "valid_targets_min": 499
3014
+ },
3015
+ {
3016
+ "epoch": 4.3776,
3017
+ "grad_norm": 0.3784326265909576,
3018
+ "learning_rate": 1.8822350797640543e-06,
3019
+ "loss": 0.3449,
3020
+ "loss_nan_ranks": 0,
3021
+ "loss_rank_avg": 0.15875539183616638,
3022
+ "step": 1370,
3023
+ "valid_targets_mean": 3690.6,
3024
+ "valid_targets_min": 721
3025
+ },
3026
+ {
3027
+ "epoch": 4.3936,
3028
+ "grad_norm": 0.35061428660208377,
3029
+ "learning_rate": 1.788867542270729e-06,
3030
+ "loss": 0.3443,
3031
+ "loss_nan_ranks": 0,
3032
+ "loss_rank_avg": 0.17543794214725494,
3033
+ "step": 1375,
3034
+ "valid_targets_mean": 4049.7,
3035
+ "valid_targets_min": 427
3036
+ },
3037
+ {
3038
+ "epoch": 4.4096,
3039
+ "grad_norm": 0.38301901048882014,
3040
+ "learning_rate": 1.6977665630714345e-06,
3041
+ "loss": 0.3392,
3042
+ "loss_nan_ranks": 0,
3043
+ "loss_rank_avg": 0.1260443776845932,
3044
+ "step": 1380,
3045
+ "valid_targets_mean": 2746.2,
3046
+ "valid_targets_min": 734
3047
+ },
3048
+ {
3049
+ "epoch": 4.4256,
3050
+ "grad_norm": 0.39167736257806884,
3051
+ "learning_rate": 1.6089434805977799e-06,
3052
+ "loss": 0.3486,
3053
+ "loss_nan_ranks": 0,
3054
+ "loss_rank_avg": 0.19264991581439972,
3055
+ "step": 1385,
3056
+ "valid_targets_mean": 4090.4,
3057
+ "valid_targets_min": 732
3058
+ },
3059
+ {
3060
+ "epoch": 4.4416,
3061
+ "grad_norm": 0.3724692418976962,
3062
+ "learning_rate": 1.5224093497742654e-06,
3063
+ "loss": 0.3473,
3064
+ "loss_nan_ranks": 0,
3065
+ "loss_rank_avg": 0.19777175784111023,
3066
+ "step": 1390,
3067
+ "valid_targets_mean": 4353.8,
3068
+ "valid_targets_min": 682
3069
+ },
3070
+ {
3071
+ "epoch": 4.4576,
3072
+ "grad_norm": 2.915445545430493,
3073
+ "learning_rate": 1.4381749406423695e-06,
3074
+ "loss": 0.3411,
3075
+ "loss_nan_ranks": 0,
3076
+ "loss_rank_avg": 0.19258227944374084,
3077
+ "step": 1395,
3078
+ "valid_targets_mean": 4315.4,
3079
+ "valid_targets_min": 855
3080
+ },
3081
+ {
3082
+ "epoch": 4.4736,
3083
+ "grad_norm": 0.3805102691027801,
3084
+ "learning_rate": 1.3562507370201062e-06,
3085
+ "loss": 0.3402,
3086
+ "loss_nan_ranks": 0,
3087
+ "loss_rank_avg": 0.18339085578918457,
3088
+ "step": 1400,
3089
+ "valid_targets_mean": 3903.1,
3090
+ "valid_targets_min": 764
3091
+ },
3092
+ {
3093
+ "epoch": 4.4896,
3094
+ "grad_norm": 0.37129014250841613,
3095
+ "learning_rate": 1.2766469351972345e-06,
3096
+ "loss": 0.3156,
3097
+ "loss_nan_ranks": 0,
3098
+ "loss_rank_avg": 0.15153226256370544,
3099
+ "step": 1405,
3100
+ "valid_targets_mean": 3064.1,
3101
+ "valid_targets_min": 699
3102
+ },
3103
+ {
3104
+ "epoch": 4.5056,
3105
+ "grad_norm": 0.3726092816106853,
3106
+ "learning_rate": 1.1993734426661985e-06,
3107
+ "loss": 0.3506,
3108
+ "loss_nan_ranks": 0,
3109
+ "loss_rank_avg": 0.1535291075706482,
3110
+ "step": 1410,
3111
+ "valid_targets_mean": 3952.9,
3112
+ "valid_targets_min": 543
3113
+ },
3114
+ {
3115
+ "epoch": 4.5216,
3116
+ "grad_norm": 0.3591276212261323,
3117
+ "learning_rate": 1.1244398768890496e-06,
3118
+ "loss": 0.3161,
3119
+ "loss_nan_ranks": 0,
3120
+ "loss_rank_avg": 0.1446693241596222,
3121
+ "step": 1415,
3122
+ "valid_targets_mean": 3641.4,
3123
+ "valid_targets_min": 892
3124
+ },
3125
+ {
3126
+ "epoch": 4.5376,
3127
+ "grad_norm": 0.3912764396308065,
3128
+ "learning_rate": 1.0518555641004613e-06,
3129
+ "loss": 0.3421,
3130
+ "loss_nan_ranks": 0,
3131
+ "loss_rank_avg": 0.10798218846321106,
3132
+ "step": 1420,
3133
+ "valid_targets_mean": 2451.6,
3134
+ "valid_targets_min": 699
3135
+ },
3136
+ {
3137
+ "epoch": 4.5536,
3138
+ "grad_norm": 0.3411240761105926,
3139
+ "learning_rate": 9.816295381469954e-07,
3140
+ "loss": 0.3401,
3141
+ "loss_nan_ranks": 0,
3142
+ "loss_rank_avg": 0.19344064593315125,
3143
+ "step": 1425,
3144
+ "valid_targets_mean": 5260.7,
3145
+ "valid_targets_min": 430
3146
+ },
3147
+ {
3148
+ "epoch": 4.5696,
3149
+ "grad_norm": 0.3991311146571275,
3150
+ "learning_rate": 9.137705393627239e-07,
3151
+ "loss": 0.3458,
3152
+ "loss_nan_ranks": 0,
3153
+ "loss_rank_avg": 0.17564912140369415,
3154
+ "step": 1430,
3155
+ "valid_targets_mean": 3772.7,
3156
+ "valid_targets_min": 771
3157
+ },
3158
+ {
3159
+ "epoch": 4.5856,
3160
+ "grad_norm": 0.4907284404134834,
3161
+ "learning_rate": 8.482870134814214e-07,
3162
+ "loss": 0.3632,
3163
+ "loss_nan_ranks": 0,
3164
+ "loss_rank_avg": 0.1778586059808731,
3165
+ "step": 1435,
3166
+ "valid_targets_mean": 2586.2,
3167
+ "valid_targets_min": 1050
3168
+ },
3169
+ {
3170
+ "epoch": 4.6016,
3171
+ "grad_norm": 0.3983087034279942,
3172
+ "learning_rate": 7.851871105854125e-07,
3173
+ "loss": 0.3218,
3174
+ "loss_nan_ranks": 0,
3175
+ "loss_rank_avg": 0.19727054238319397,
3176
+ "step": 1440,
3177
+ "valid_targets_mean": 4347.9,
3178
+ "valid_targets_min": 649
3179
+ },
3180
+ {
3181
+ "epoch": 4.6176,
3182
+ "grad_norm": 0.3428497731326709,
3183
+ "learning_rate": 7.244786840912033e-07,
3184
+ "loss": 0.3274,
3185
+ "loss_nan_ranks": 0,
3186
+ "loss_rank_avg": 0.17086507380008698,
3187
+ "step": 1445,
3188
+ "valid_targets_mean": 4404.5,
3189
+ "valid_targets_min": 953
3190
+ },
3191
+ {
3192
+ "epoch": 4.6336,
3193
+ "grad_norm": 0.40290620049486076,
3194
+ "learning_rate": 6.661692897720517e-07,
3195
+ "loss": 0.3457,
3196
+ "loss_nan_ranks": 0,
3197
+ "loss_rank_avg": 0.20860999822616577,
3198
+ "step": 1450,
3199
+ "valid_targets_mean": 4176.6,
3200
+ "valid_targets_min": 840
3201
+ },
3202
+ {
3203
+ "epoch": 4.6495999999999995,
3204
+ "grad_norm": 0.3960734519984612,
3205
+ "learning_rate": 6.10266184817565e-07,
3206
+ "loss": 0.3496,
3207
+ "loss_nan_ranks": 0,
3208
+ "loss_rank_avg": 0.12197977304458618,
3209
+ "step": 1455,
3210
+ "valid_targets_mean": 3047.1,
3211
+ "valid_targets_min": 706
3212
+ },
3213
+ {
3214
+ "epoch": 4.6655999999999995,
3215
+ "grad_norm": 0.3969892477990081,
3216
+ "learning_rate": 5.567763269304927e-07,
3217
+ "loss": 0.3343,
3218
+ "loss_nan_ranks": 0,
3219
+ "loss_rank_avg": 0.19510459899902344,
3220
+ "step": 1460,
3221
+ "valid_targets_mean": 4116.6,
3222
+ "valid_targets_min": 707
3223
+ },
3224
+ {
3225
+ "epoch": 4.6815999999999995,
3226
+ "grad_norm": 0.4355543500009561,
3227
+ "learning_rate": 5.057063734607392e-07,
3228
+ "loss": 0.3518,
3229
+ "loss_nan_ranks": 0,
3230
+ "loss_rank_avg": 0.14991804957389832,
3231
+ "step": 1465,
3232
+ "valid_targets_mean": 3097.3,
3233
+ "valid_targets_min": 725
3234
+ },
3235
+ {
3236
+ "epoch": 4.6975999999999996,
3237
+ "grad_norm": 0.35587447872188005,
3238
+ "learning_rate": 4.570626805768119e-07,
3239
+ "loss": 0.3197,
3240
+ "loss_nan_ranks": 0,
3241
+ "loss_rank_avg": 0.17723935842514038,
3242
+ "step": 1470,
3243
+ "valid_targets_mean": 5329.2,
3244
+ "valid_targets_min": 706
3245
+ },
3246
+ {
3247
+ "epoch": 4.7136,
3248
+ "grad_norm": 0.39446481317079257,
3249
+ "learning_rate": 4.1085130247472625e-07,
3250
+ "loss": 0.324,
3251
+ "loss_nan_ranks": 0,
3252
+ "loss_rank_avg": 0.14012877643108368,
3253
+ "step": 1475,
3254
+ "valid_targets_mean": 3485.1,
3255
+ "valid_targets_min": 681
3256
+ },
3257
+ {
3258
+ "epoch": 4.7296,
3259
+ "grad_norm": 0.4072062654615722,
3260
+ "learning_rate": 3.670779906244981e-07,
3261
+ "loss": 0.3271,
3262
+ "loss_nan_ranks": 0,
3263
+ "loss_rank_avg": 0.19580386579036713,
3264
+ "step": 1480,
3265
+ "valid_targets_mean": 4416.1,
3266
+ "valid_targets_min": 776
3267
+ },
3268
+ {
3269
+ "epoch": 4.7456,
3270
+ "grad_norm": 0.3084493418868167,
3271
+ "learning_rate": 3.2574819305432713e-07,
3272
+ "loss": 0.3248,
3273
+ "loss_nan_ranks": 0,
3274
+ "loss_rank_avg": 0.16733351349830627,
3275
+ "step": 1485,
3276
+ "valid_targets_mean": 4452.8,
3277
+ "valid_targets_min": 698
3278
+ },
3279
+ {
3280
+ "epoch": 4.7616,
3281
+ "grad_norm": 0.3598510249644468,
3282
+ "learning_rate": 2.8686705367250824e-07,
3283
+ "loss": 0.3355,
3284
+ "loss_nan_ranks": 0,
3285
+ "loss_rank_avg": 0.14451885223388672,
3286
+ "step": 1490,
3287
+ "valid_targets_mean": 2834.7,
3288
+ "valid_targets_min": 632
3289
+ },
3290
+ {
3291
+ "epoch": 4.7776,
3292
+ "grad_norm": 0.4461356647281616,
3293
+ "learning_rate": 2.504394116272502e-07,
3294
+ "loss": 0.3403,
3295
+ "loss_nan_ranks": 0,
3296
+ "loss_rank_avg": 0.16207420825958252,
3297
+ "step": 1495,
3298
+ "valid_targets_mean": 2442.2,
3299
+ "valid_targets_min": 846
3300
+ },
3301
+ {
3302
+ "epoch": 4.7936,
3303
+ "grad_norm": 0.3939489927901413,
3304
+ "learning_rate": 2.1646980070437973e-07,
3305
+ "loss": 0.323,
3306
+ "loss_nan_ranks": 0,
3307
+ "loss_rank_avg": 0.22261710464954376,
3308
+ "step": 1500,
3309
+ "valid_targets_mean": 4795.7,
3310
+ "valid_targets_min": 944
3311
+ },
3312
+ {
3313
+ "epoch": 4.8096,
3314
+ "grad_norm": 0.3578185640710523,
3315
+ "learning_rate": 1.8496244876306858e-07,
3316
+ "loss": 0.3259,
3317
+ "loss_nan_ranks": 0,
3318
+ "loss_rank_avg": 0.12147246301174164,
3319
+ "step": 1505,
3320
+ "valid_targets_mean": 2807.9,
3321
+ "valid_targets_min": 922
3322
+ },
3323
+ {
3324
+ "epoch": 4.8256,
3325
+ "grad_norm": 0.3669760845147576,
3326
+ "learning_rate": 1.559212772096319e-07,
3327
+ "loss": 0.331,
3328
+ "loss_nan_ranks": 0,
3329
+ "loss_rank_avg": 0.13725429773330688,
3330
+ "step": 1510,
3331
+ "valid_targets_mean": 3579.9,
3332
+ "valid_targets_min": 1080
3333
+ },
3334
+ {
3335
+ "epoch": 4.8416,
3336
+ "grad_norm": 0.37618766080255067,
3337
+ "learning_rate": 1.2934990050947228e-07,
3338
+ "loss": 0.3334,
3339
+ "loss_nan_ranks": 0,
3340
+ "loss_rank_avg": 0.15945936739444733,
3341
+ "step": 1515,
3342
+ "valid_targets_mean": 4211.1,
3343
+ "valid_targets_min": 660
3344
+ },
3345
+ {
3346
+ "epoch": 4.8576,
3347
+ "grad_norm": 0.498977453048548,
3348
+ "learning_rate": 1.0525162573723269e-07,
3349
+ "loss": 0.3489,
3350
+ "loss_nan_ranks": 0,
3351
+ "loss_rank_avg": 0.15219208598136902,
3352
+ "step": 1520,
3353
+ "valid_targets_mean": 2302.8,
3354
+ "valid_targets_min": 673
3355
+ },
3356
+ {
3357
+ "epoch": 4.8736,
3358
+ "grad_norm": 0.3680209079970337,
3359
+ "learning_rate": 8.362945216517704e-08,
3360
+ "loss": 0.3326,
3361
+ "loss_nan_ranks": 0,
3362
+ "loss_rank_avg": 0.19316397607326508,
3363
+ "step": 1525,
3364
+ "valid_targets_mean": 4751.6,
3365
+ "valid_targets_min": 829
3366
+ },
3367
+ {
3368
+ "epoch": 4.8896,
3369
+ "grad_norm": 0.3308997729831502,
3370
+ "learning_rate": 6.448607088991532e-08,
3371
+ "loss": 0.3455,
3372
+ "loss_nan_ranks": 0,
3373
+ "loss_rank_avg": 0.1902647465467453,
3374
+ "step": 1530,
3375
+ "valid_targets_mean": 6998.1,
3376
+ "valid_targets_min": 1277
3377
+ },
3378
+ {
3379
+ "epoch": 4.9056,
3380
+ "grad_norm": 0.3690095050432829,
3381
+ "learning_rate": 4.782386449746934e-08,
3382
+ "loss": 0.3253,
3383
+ "loss_nan_ranks": 0,
3384
+ "loss_rank_avg": 0.128545343875885,
3385
+ "step": 1535,
3386
+ "valid_targets_mean": 3642.2,
3387
+ "valid_targets_min": 677
3388
+ },
3389
+ {
3390
+ "epoch": 4.9216,
3391
+ "grad_norm": 0.4085506322401526,
3392
+ "learning_rate": 3.3644906766734374e-08,
3393
+ "loss": 0.334,
3394
+ "loss_nan_ranks": 0,
3395
+ "loss_rank_avg": 0.20792028307914734,
3396
+ "step": 1540,
3397
+ "valid_targets_mean": 4389.9,
3398
+ "valid_targets_min": 275
3399
+ },
3400
+ {
3401
+ "epoch": 4.9376,
3402
+ "grad_norm": 0.41617364004087926,
3403
+ "learning_rate": 2.1950962411367848e-08,
3404
+ "loss": 0.3283,
3405
+ "loss_nan_ranks": 0,
3406
+ "loss_rank_avg": 0.20928677916526794,
3407
+ "step": 1545,
3408
+ "valid_targets_mean": 4790.5,
3409
+ "valid_targets_min": 1106
3410
+ },
3411
+ {
3412
+ "epoch": 4.9536,
3413
+ "grad_norm": 0.3449113707606809,
3414
+ "learning_rate": 1.2743486860165022e-08,
3415
+ "loss": 0.3304,
3416
+ "loss_nan_ranks": 0,
3417
+ "loss_rank_avg": 0.18289443850517273,
3418
+ "step": 1550,
3419
+ "valid_targets_mean": 4802.8,
3420
+ "valid_targets_min": 823
3421
+ },
3422
+ {
3423
+ "epoch": 4.9696,
3424
+ "grad_norm": 0.39524375236617265,
3425
+ "learning_rate": 6.023626075915001e-09,
3426
+ "loss": 0.3408,
3427
+ "loss_nan_ranks": 0,
3428
+ "loss_rank_avg": 0.18949632346630096,
3429
+ "step": 1555,
3430
+ "valid_targets_mean": 3546.4,
3431
+ "valid_targets_min": 907
3432
+ },
3433
+ {
3434
+ "epoch": 4.9856,
3435
+ "grad_norm": 0.3612907490017341,
3436
+ "learning_rate": 1.7922164127659457e-09,
3437
+ "loss": 0.3266,
3438
+ "loss_nan_ranks": 0,
3439
+ "loss_rank_avg": 0.21476075053215027,
3440
+ "step": 1560,
3441
+ "valid_targets_mean": 5423.8,
3442
+ "valid_targets_min": 711
3443
+ },
3444
+ {
3445
+ "epoch": 5.0,
3446
+ "grad_norm": 0.5832122688927835,
3447
+ "learning_rate": 4.978451213499824e-11,
3448
+ "loss": 0.337,
3449
+ "loss_nan_ranks": 0,
3450
+ "loss_rank_avg": 0.3492066562175751,
3451
+ "step": 1565,
3452
+ "valid_targets_mean": 3627.8,
3453
+ "valid_targets_min": 954
3454
+ },
3455
+ {
3456
+ "epoch": 5.0,
3457
+ "loss_nan_ranks": 0,
3458
+ "loss_rank_avg": 0.3492066562175751,
3459
+ "step": 1565,
3460
+ "total_flos": 7.336290139934556e+17,
3461
+ "train_loss": 0.37341142744301986,
3462
+ "train_runtime": 17388.125,
3463
+ "train_samples_per_second": 2.876,
3464
+ "train_steps_per_second": 0.09,
3465
+ "valid_targets_mean": 3627.8,
3466
+ "valid_targets_min": 954
3467
+ }
3468
+ ],
3469
+ "logging_steps": 5,
3470
+ "max_steps": 1565,
3471
+ "num_input_tokens_seen": 0,
3472
+ "num_train_epochs": 5,
3473
+ "save_steps": 500,
3474
+ "stateful_callbacks": {
3475
+ "TrainerControl": {
3476
+ "args": {
3477
+ "should_epoch_stop": false,
3478
+ "should_evaluate": false,
3479
+ "should_log": false,
3480
+ "should_save": false,
3481
+ "should_training_stop": false
3482
+ },
3483
+ "attributes": {}
3484
+ }
3485
+ },
3486
+ "total_flos": 7.336290139934556e+17,
3487
+ "train_batch_size": 1,
3488
+ "trial_name": null,
3489
+ "trial_params": null
3490
+ }
training_loss.png ADDED