Swephoenix
/

smollm2-lora-xaji0y6d-1742337923

Generated from Trainer

Model card Files Files and versions

smollm2-lora-xaji0y6d-1742337923 / README.md

Swephoenix's picture

Training in progress, step 100

712c030 verified 9 months ago

|

history blame contribute delete

3.52 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
	tags:
	- generated_from_trainer
	model-index:
	- name: smollm2-lora-hrz1ceio-1742335328
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# smollm2-lora-hrz1ceio-1742335328

	This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.4598
	- Perplexity: 11.7017

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.01
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Perplexity \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:----------:\|
	\| 3.8247 \| 1.48 \| 10 \| 3.8893 \| 48.8773 \|
	\| 3.7821 \| 2.96 \| 20 \| 3.8282 \| 45.9783 \|
	\| 3.7187 \| 4.32 \| 30 \| 3.7582 \| 42.8707 \|
	\| 3.6627 \| 5.8 \| 40 \| 3.6850 \| 39.8460 \|
	\| 3.5609 \| 7.16 \| 50 \| 3.6080 \| 36.8919 \|
	\| 3.4883 \| 8.64 \| 60 \| 3.5351 \| 34.2986 \|
	\| 3.3997 \| 10.0 \| 70 \| 3.4620 \| 31.8798 \|
	\| 3.3753 \| 11.48 \| 80 \| 3.3897 \| 29.6568 \|
	\| 3.249 \| 12.96 \| 90 \| 3.3187 \| 27.6251 \|
	\| 3.2619 \| 14.32 \| 100 \| 3.2492 \| 25.7684 \|
	\| 3.1184 \| 15.8 \| 110 \| 3.1800 \| 24.0468 \|
	\| 3.0852 \| 17.16 \| 120 \| 3.1133 \| 22.4950 \|
	\| 2.9999 \| 18.64 \| 130 \| 3.0489 \| 21.0926 \|
	\| 2.9315 \| 20.0 \| 140 \| 2.9865 \| 19.8159 \|
	\| 2.848 \| 21.48 \| 150 \| 2.9260 \| 18.6519 \|
	\| 2.8046 \| 22.96 \| 160 \| 2.8679 \| 17.6005 \|
	\| 2.7379 \| 24.32 \| 170 \| 2.8153 \| 16.6979 \|
	\| 2.704 \| 25.8 \| 180 \| 2.7637 \| 15.8585 \|
	\| 2.6349 \| 27.16 \| 190 \| 2.7168 \| 15.1316 \|
	\| 2.5972 \| 28.64 \| 200 \| 2.6749 \| 14.5109 \|
	\| 2.5585 \| 30.0 \| 210 \| 2.6327 \| 13.9107 \|
	\| 2.5502 \| 31.48 \| 220 \| 2.5979 \| 13.4359 \|
	\| 2.5166 \| 32.96 \| 230 \| 2.5670 \| 13.0264 \|
	\| 2.4733 \| 34.32 \| 240 \| 2.5395 \| 12.6737 \|
	\| 2.4502 \| 35.8 \| 250 \| 2.5152 \| 12.3691 \|
	\| 2.4268 \| 37.16 \| 260 \| 2.4965 \| 12.1393 \|
	\| 2.365 \| 38.64 \| 270 \| 2.4808 \| 11.9507 \|
	\| 2.4208 \| 40.0 \| 280 \| 2.4707 \| 11.8304 \|
	\| 2.3818 \| 41.48 \| 290 \| 2.4623 \| 11.7311 \|
	\| 2.391 \| 42.96 \| 300 \| 2.4598 \| 11.7017 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.48.2
	- Pytorch 2.1.0+cu118
	- Datasets 3.4.1
	- Tokenizers 0.21.1