metadata
model_name: transformer_multi_head_bert_updated
base_model: GroNLP/bert-base-dutch-cased
language: nl
library_name: transformers
pipeline_tag: text-classification
license: mit
tags:
- dutch
- regression
- multi-head
- bert
- text-quality
datasets:
- proprietary
target_names:
- delta_cola_to_final
- delta_perplexity_to_final_large
- iter_to_final_simplified
- robbert_delta_blurb_to_final
metrics:
per_epoch:
- epoch: 1
valid_loss: 0.016363
delta_cola_to_final:
rmse: 0.149753
r2: 0.36889
delta_perplexity_to_final_large:
rmse: 0.099918
r2: 0.648474
iter_to_final_simplified:
rmse: 0.138463
r2: 0.818398
robbert_delta_blurb_to_final:
rmse: 0.117767
r2: 0.729513
mean_rmse: 0.126476
- epoch: 2
valid_loss: 0.01522
delta_cola_to_final:
rmse: 0.146628
r2: 0.394957
delta_perplexity_to_final_large:
rmse: 0.10185
r2: 0.634748
iter_to_final_simplified:
rmse: 0.127215
r2: 0.846706
robbert_delta_blurb_to_final:
rmse: 0.113245
r2: 0.749883
mean_rmse: 0.122235
- epoch: 3
valid_loss: 0.015208
delta_cola_to_final:
rmse: 0.146956
r2: 0.392247
delta_perplexity_to_final_large:
rmse: 0.098563
r2: 0.657945
iter_to_final_simplified:
rmse: 0.127813
r2: 0.84526
robbert_delta_blurb_to_final:
rmse: 0.114824
r2: 0.742861
mean_rmse: 0.122039
- epoch: 4
valid_loss: 0.015156
delta_cola_to_final:
rmse: 0.142936
r2: 0.425041
delta_perplexity_to_final_large:
rmse: 0.099919
r2: 0.648468
iter_to_final_simplified:
rmse: 0.128406
r2: 0.843822
robbert_delta_blurb_to_final:
rmse: 0.117143
r2: 0.732371
mean_rmse: 0.122101
- epoch: 5
valid_loss: 0.015457
delta_cola_to_final:
rmse: 0.144705
r2: 0.410725
delta_perplexity_to_final_large:
rmse: 0.100198
r2: 0.646506
iter_to_final_simplified:
rmse: 0.131055
r2: 0.837311
robbert_delta_blurb_to_final:
rmse: 0.116936
r2: 0.733315
mean_rmse: 0.123223
test:
aggregate_rmse: 0.0769
aggregate_r2: 0.8425
mean_rmse: 0.121
transformer_multi_head_bert_updated
A multi-head transformer regression model based on BERT (GroNLP/bert-base-dutch-cased), fine-tuned to predict four normalized delta scores for Dutch book reviews. The four output heads are:
- delta_cola_to_final
- delta_perplexity_to_final_large
- iter_to_final_simplified
- robbert_delta_blurb_to_final
⚠️ The order of these outputs is crucial and must be maintained exactly as above during inference.
Changing the order will cause incorrect mapping of predicted values to their respective targets.
Additionally, a final aggregate score is provided (mean of the four heads).
📈 Training & Evaluation
- Base model:
GroNLP/bert-base-dutch-cased - Fine-tuning: 5 epochs on a proprietary dataset
- Output heads: 4
- Problem type: multi-head regression
Per-Epoch Validation Metrics
| Epoch | Val Loss | ΔCoLA RMSE / R² | ΔPerp RMSE / R² | Iter RMSE / R² | Blurb RMSE / R² | Mean RMSE |
|---|---|---|---|---|---|---|
| 1 | 0.01636 | 0.1498 / 0.3689 | 0.0999 / 0.6485 | 0.1385 / 0.8184 | 0.1178 / 0.7295 | 0.1265 |
| 2 | 0.01522 | 0.1466 / 0.3950 | 0.1019 / 0.6347 | 0.1272 / 0.8467 | 0.1132 / 0.7499 | 0.1222 |
| 3 | 0.01521 | 0.1470 / 0.3922 | 0.0986 / 0.6579 | 0.1278 / 0.8453 | 0.1148 / 0.7429 | 0.1220 |
| 4 | 0.01516 | 0.1429 / 0.4250 | 0.0999 / 0.6485 | 0.1284 / 0.8438 | 0.1171 / 0.7324 | 0.1221 |
| 5 | 0.01546 | 0.1447 / 0.4107 | 0.1002 / 0.6465 | 0.1311 / 0.8373 | 0.1169 / 0.7333 | 0.1232 |
✅ Final Aggregate Performance (Test)
| Metric | Value |
|---|---|
| Aggregate RMSE | 0.0769 |
| Aggregate R² | 0.8425 |
| Mean RMSE (heads) | 0.1210 |
🗂️ Test Metrics (Per Target)
| Target | RMSE | R² |
|---|---|---|
| delta_cola_to_final | 0.1463 | 0.4286 |
| delta_perplexity_to_final_large | 0.0955 | 0.6802 |
| iter_to_final_simplified | 0.1255 | 0.8535 |
| robbert_delta_blurb_to_final | 0.1168 | 0.7319 |
🏷️ Notes
- Base model:
GroNLP/bert-base-dutch-cased - Fine-tuned for multi-head regression on Dutch book reviews
- Trained for 5 epochs on a proprietary dataset
- Sigmoid activation built into each head
- Re-aggregation: simple average of the four head outputs
🛠️ Training Arguments
num_train_epochs=5per_device_train_batch_size=8per_device_eval_batch_size=16gradient_accumulation_steps=2learning_rate=2e-5weight_decay=0.01eval_strategy="epoch"save_strategy="epoch"load_best_model_at_end=Truemetric_for_best_model="mean_rmse"greater_is_better=Falsebf16enabled if supported, elsefp16enabledlogging_strategy="epoch"push_to_hub=Truewith model IDFelixbrk/bert-base-dutch-cased-multi-score-tuned-positivehub_strategy="end"- Early stopping with patience 2 epochs
⚠️ Important:
- Always load this model with trust_remote_code=True as it uses a custom multi-head regression architecture.
- Maintain the output order exactly for correct interpretation of results.