1e22b46b8eddf5a05a4230f8924cd07c

This model is a fine-tuned version of facebook/opt-125m on the nyu-mll/glue [wnli] dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	1.1491	0	0.8410	0.4219	0.2967	0.4219	0.4219	0.4219
No log	1	19	1.0610	0.0078	0.9634	0.5469	0.3535	0.5469	0.5469	0.5469
No log	2	38	0.7931	0.0156	0.9641	0.5469	0.3535	0.5469	0.5469	0.5469
No log	3	57	0.7310	0.0312	1.0373	0.5	0.4330	0.5	0.5	0.5
No log	4	76	0.8235	0.0625	1.0913	0.4062	0.3552	0.4062	0.4062	0.4062
No log	5	95	0.8062	0.125	1.0666	0.4531	0.3347	0.4531	0.4531	0.4531
0.0946	6	114	0.7590	0.25	1.2157	0.4219	0.3660	0.4219	0.4219	0.4219
0.0946	7	133	0.7375	0.5	1.6726	0.4375	0.3263	0.4375	0.4375	0.4375

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(115)

this model