train_multirc_42_1767887030

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1760
  • Num Input Tokens Seen: 117191744

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1438 0.5000 6130 0.2489 5848496
0.4982 1.0001 12260 0.1760 11720992
0.0369 1.5001 18390 0.1900 17588784
0.0016 2.0002 24520 0.2060 23439824
0.2369 2.5002 30650 0.1767 29311440
0.0257 3.0002 36780 0.1950 35151152
0.223 3.5003 42910 0.2058 41024464
0.0016 4.0003 49040 0.2158 46873952
0.0035 4.5004 55170 0.2312 52719088
0.2967 5.0004 61300 0.2215 58598512
0.3395 5.5004 67430 0.2390 64472848
0.0033 6.0005 73560 0.2531 70330720
0.5761 6.5005 79690 0.2770 76208432
0.2139 7.0006 85820 0.2529 82055264
0.0006 7.5006 91950 0.2736 87914208
0.0006 8.0007 98080 0.2698 93778576
0.3193 8.5007 104210 0.2842 99666304
0.7517 9.0007 110340 0.2875 105492976
0.0009 9.5008 116470 0.2918 111354544

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
61
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1767887030

Adapter
(2186)
this model