train_siqa_42_1767887014

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.2367
Num Input Tokens Seen: 28954576

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.7041	0.5000	7518	0.3190	1448896
0.1653	1.0001	15036	0.2600	2896184
0.2541	1.5001	22554	0.2703	4343416
0.0854	2.0001	30072	0.2367	5792544
0.3901	2.5002	37590	0.3058	7242912
0.0042	3.0002	45108	0.2668	8688552
0.4883	3.5002	52626	0.3042	10135720
1.3939	4.0003	60144	0.2776	11583920
0.5002	4.5003	67662	0.3179	13032336
0.0026	5.0003	75180	0.2885	14478408
0.4854	5.5004	82698	0.3194	15926280
0.0095	6.0004	90216	0.3346	17373656
0.0395	6.5004	97734	0.3460	18821816
0.0021	7.0005	105252	0.3421	20269568
0.3943	7.5005	112770	0.3770	21717600
0.5453	8.0005	120288	0.3536	23165384
0.0018	8.5006	127806	0.3693	24612584
0.0009	9.0006	135324	0.3713	26060784
0.3855	9.5006	142842	0.3735	27509360

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 67

Model tree for rbelanec/train_siqa_42_1767887014

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2187)

this model