my_final_llama_model_v2_add_wiki_fix_resume

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
training_steps: 8000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
4.2118	0.1537	500	4.1942
4.1614	0.3074	1000	4.1656
4.1253	0.4611	1500	4.1383
4.1131	0.6148	2000	4.1153
4.0994	0.7685	2500	4.0954
4.0668	0.9222	3000	4.0779
4.0383	1.0756	3500	4.0644
4.049	1.2293	4000	4.0529
4.0126	1.3830	4500	4.0404
3.9655	1.5367	5000	4.0297
3.9358	1.6904	5500	4.0210
3.9137	1.8441	6000	4.0134
4.0083	1.9978	6500	4.0067
3.8427	2.1512	7000	4.0045
3.8823	2.3049	7500	4.0014
3.9662	2.4586	8000	4.0000

Safetensors

Model size

64.4M params

Tensor type

F32

Adapters