thiomajid commited on
Commit
2377a2a
·
verified ·
1 Parent(s): 5deb6c3

Model save

Browse files
README.md CHANGED
@@ -19,7 +19,7 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on the None dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 10.4345
23
 
24
  ## Model description
25
 
@@ -39,23 +39,27 @@ More information needed
39
 
40
  The following hyperparameters were used during training:
41
  - learning_rate: 0.002
42
- - train_batch_size: 32
43
- - eval_batch_size: 32
44
  - seed: 1652
45
  - gradient_accumulation_steps: 5
46
- - total_train_batch_size: 160
47
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
  - lr_scheduler_type: cosine
49
  - lr_scheduler_warmup_ratio: 0.2
50
- - num_epochs: 2
51
  - mixed_precision_training: Native AMP
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:-----:|:----:|:---------------:|
57
- | 62.6423 | 1.0 | 401 | 11.5361 |
58
- | 53.3503 | 2.0 | 802 | 10.4345 |
 
 
 
 
59
 
60
 
61
  ### Framework versions
 
19
 
20
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on the None dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 9.6934
23
 
24
  ## Model description
25
 
 
39
 
40
  The following hyperparameters were used during training:
41
  - learning_rate: 0.002
42
+ - train_batch_size: 64
43
+ - eval_batch_size: 64
44
  - seed: 1652
45
  - gradient_accumulation_steps: 5
46
+ - total_train_batch_size: 320
47
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
  - lr_scheduler_type: cosine
49
  - lr_scheduler_warmup_ratio: 0.2
50
+ - num_epochs: 6
51
  - mixed_precision_training: Native AMP
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:-----:|:----:|:---------------:|
57
+ | 213.2576 | 1.0 | 201 | 22.5111 |
58
+ | 61.0713 | 2.0 | 402 | 11.1953 |
59
+ | 53.5773 | 3.0 | 603 | 10.4559 |
60
+ | 51.1444 | 4.0 | 804 | 10.0289 |
61
+ | 49.6945 | 5.0 | 1005 | 9.7750 |
62
+ | 48.9875 | 6.0 | 1206 | 9.6934 |
63
 
64
 
65
  ### Framework versions
events.out.tfevents.1753992692.bdaec93ae06a.418.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abba0c17ae395e1c98cc1251b3f20b9da02db707c87923b5534bc3bb96956601
3
+ size 359