router-mmBERT-base-6e-5-batch64

This model is a fine-tuned version of jhu-clsp/mmBERT-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 6e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 2

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
1.379	0.0929	100	0.7915	0.5262	0.6719	0.5262	0.3662
1.3923	0.1859	200	0.6671	0.6013	0.6284	0.6013	0.5650
1.3808	0.2788	300	0.6831	0.5710	0.6836	0.5710	0.4720
1.358	0.3717	400	0.6631	0.5903	0.5927	0.5903	0.5904
1.308	0.4647	500	0.6580	0.6024	0.6031	0.6024	0.5955
1.3378	0.5576	600	0.6953	0.5295	0.5883	0.5295	0.4701
1.3219	0.6506	700	0.6657	0.5765	0.5888	0.5765	0.5710
1.3212	0.7435	800	0.6580	0.5958	0.5953	0.5958	0.5954
1.2893	0.8364	900	0.6612	0.5919	0.6025	0.5919	0.5883
1.2436	0.9294	1000	0.6543	0.6151	0.6225	0.6151	0.6011
1.3296	1.0223	1100	0.6509	0.6157	0.6311	0.6157	0.5941
1.2985	1.1152	1200	0.6564	0.6151	0.6157	0.6151	0.6101
1.1993	1.2082	1300	0.6562	0.6013	0.6085	0.6013	0.5997
1.2665	1.3011	1400	0.6832	0.5699	0.5980	0.5699	0.5520
1.2523	1.3941	1500	0.6548	0.6068	0.6062	0.6068	0.6062
1.1899	1.4870	1600	0.6545	0.6173	0.6166	0.6173	0.6162
1.2433	1.5799	1700	0.6487	0.6240	0.6264	0.6240	0.6169
1.2378	1.6729	1800	0.6507	0.6201	0.6196	0.6201	0.6197
1.2489	1.7658	1900	0.6441	0.6322	0.6340	0.6322	0.6268
1.2625	1.8587	2000	0.6448	0.6273	0.6271	0.6273	0.6245
1.3145	1.9517	2100	0.6451	0.6251	0.6246	0.6251	0.6229

Safetensors

Model size

0.3B params

Tensor type

F32

Base model

Finetuned

(27)

this model