train_multirc_42_1767887030

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

Loss: 0.1760
Num Input Tokens Seen: 117191744

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1438	0.5000	6130	0.2489	5848496
0.4982	1.0001	12260	0.1760	11720992
0.0369	1.5001	18390	0.1900	17588784
0.0016	2.0002	24520	0.2060	23439824
0.2369	2.5002	30650	0.1767	29311440
0.0257	3.0002	36780	0.1950	35151152
0.223	3.5003	42910	0.2058	41024464
0.0016	4.0003	49040	0.2158	46873952
0.0035	4.5004	55170	0.2312	52719088
0.2967	5.0004	61300	0.2215	58598512
0.3395	5.5004	67430	0.2390	64472848
0.0033	6.0005	73560	0.2531	70330720
0.5761	6.5005	79690	0.2770	76208432
0.2139	7.0006	85820	0.2529	82055264
0.0006	7.5006	91950	0.2736	87914208
0.0006	8.0007	98080	0.2698	93778576
0.3193	8.5007	104210	0.2842	99666304
0.7517	9.0007	110340	0.2875	105492976
0.0009	9.5008	116470	0.2918	111354544

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 61

Model tree for rbelanec/train_multirc_42_1767887030

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2186)

this model