contemmcm
/

ef0fc60307b5ae3907cd626160675744

text2text-generation

Generated from Trainer

Model card Files Files and versions

ef0fc60307b5ae3907cd626160675744 / README.md

contemmcm's picture

End of training

b53977d verified about 1 month ago

|

history blame contribute delete

2.85 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: google/mt5-xl
	tags:
	- generated_from_trainer
	metrics:
	- bleu
	model-index:
	- name: ef0fc60307b5ae3907cd626160675744
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ef0fc60307b5ae3907cd626160675744

	This model is a fine-tuned version of [google/mt5-xl](https://huggingface.co/google/mt5-xl) on the Helsinki-NLP/opus_books [fi-pl] dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2386
	- Data Size: 1.0
	- Epoch Runtime: 45.3162
	- Bleu: 3.2749

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 32
	- total_eval_batch_size: 32
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: constant
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Data Size \| Epoch Runtime \| Bleu \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:---------:\|:-------------:\|:------:\|
	\| No log \| 0 \| 0 \| 7.4356 \| 0 \| 3.0217 \| 0.0092 \|
	\| No log \| 1 \| 70 \| 5.7502 \| 0.0078 \| 3.8717 \| 0.0183 \|
	\| No log \| 2 \| 140 \| 4.5681 \| 0.0156 \| 11.2349 \| 0.0596 \|
	\| No log \| 3 \| 210 \| 4.1004 \| 0.0312 \| 16.1280 \| 0.1179 \|
	\| No log \| 4 \| 280 \| 3.9024 \| 0.0625 \| 22.6958 \| 0.1069 \|
	\| No log \| 5 \| 350 \| 3.3192 \| 0.125 \| 29.0845 \| 0.1753 \|
	\| No log \| 6 \| 420 \| 2.6971 \| 0.25 \| 29.2950 \| 0.3854 \|
	\| 0.6003 \| 7 \| 490 \| 2.4368 \| 0.5 \| 36.5773 \| 1.8668 \|
	\| 2.7265 \| 8.0 \| 560 \| 2.2457 \| 1.0 \| 59.0803 \| 2.1418 \|
	\| 2.3962 \| 9.0 \| 630 \| 2.1955 \| 1.0 \| 43.3304 \| 2.4797 \|
	\| 2.1264 \| 10.0 \| 700 \| 2.1957 \| 1.0 \| 47.0717 \| 2.3341 \|
	\| 1.9469 \| 11.0 \| 770 \| 2.1982 \| 1.0 \| 45.4346 \| 2.7379 \|
	\| 1.8626 \| 12.0 \| 840 \| 2.2279 \| 1.0 \| 52.1217 \| 3.0321 \|
	\| 1.6317 \| 13.0 \| 910 \| 2.2386 \| 1.0 \| 45.3162 \| 3.2749 \|


	### Framework versions

	- Transformers 4.57.0
	- Pytorch 2.8.0+cu128
	- Datasets 4.2.0
	- Tokenizers 0.22.1