ec8e82b2b44eaa30abdf045a6f91b52d

This model is a fine-tuned version of albert/albert-xlarge-v2 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	7.0598	0	1.5330	7.0610	2.2267	-2.1586
No log	1	179	3.4001	0.0078	1.9432	3.4011	1.5444	-0.5215
No log	2	358	2.5721	0.0156	1.9087	2.5728	1.3266	-0.1509
No log	3	537	2.5621	0.0312	2.0639	2.5628	1.3228	-0.1465
No log	4	716	3.3451	0.0625	2.4991	3.3457	1.4775	-0.4967
No log	5	895	2.7295	0.125	3.2172	2.7302	1.3552	-0.2213
0.1514	6	1074	2.3139	0.25	4.6207	2.3147	1.2901	-0.0354
2.1915	7	1253	2.3530	0.5	7.6602	2.3538	1.2955	-0.0529
2.1701	8.0	1432	2.5341	1.0	13.6003	2.5348	1.3225	-0.1339
2.2039	9.0	1611	2.3139	1.0	13.3889	2.3147	1.2901	-0.0354
2.238	10.0	1790	2.2843	1.0	13.5145	2.2851	1.2865	-0.0222
2.1154	11.0	1969	2.2954	1.0	13.3767	2.2962	1.2876	-0.0272
2.0953	12.0	2148	2.4696	1.0	13.4115	2.4703	1.3121	-0.1051
2.1473	13.0	2327	2.2897	1.0	13.2635	2.2905	1.2867	-0.0246
2.2298	14.0	2506	2.2760	1.0	13.3458	2.2768	1.2866	-0.0185
2.1942	15.0	2685	2.6864	1.0	13.3188	2.6871	1.3466	-0.2020
2.2101	16.0	2864	2.3054	1.0	13.3623	2.3061	1.2888	-0.0316
2.1232	17.0	3043	2.3875	1.0	13.3129	2.3882	1.2996	-0.0683
2.1782	18.0	3222	2.4541	1.0	13.4551	2.4549	1.3095	-0.0982

Safetensors

Model size

58.7M params

Tensor type

F32

Base model

Finetuned

(23)

this model