|  | --- | 
					
						
						|  | library_name: transformers | 
					
						
						|  | language: | 
					
						
						|  | - multilingual | 
					
						
						|  | - bn | 
					
						
						|  | - cs | 
					
						
						|  | - de | 
					
						
						|  | - en | 
					
						
						|  | - et | 
					
						
						|  | - fi | 
					
						
						|  | - fr | 
					
						
						|  | - gu | 
					
						
						|  | - ha | 
					
						
						|  | - hi | 
					
						
						|  | - is | 
					
						
						|  | - ja | 
					
						
						|  | - kk | 
					
						
						|  | - km | 
					
						
						|  | - lt | 
					
						
						|  | - lv | 
					
						
						|  | - pl | 
					
						
						|  | - ps | 
					
						
						|  | - ru | 
					
						
						|  | - ta | 
					
						
						|  | - tr | 
					
						
						|  | - uk | 
					
						
						|  | - xh | 
					
						
						|  | - zh | 
					
						
						|  | - zu | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | base_model: answerdotai/ModernBERT-large | 
					
						
						|  | tags: | 
					
						
						|  | - quality-estimation | 
					
						
						|  | - regression | 
					
						
						|  | - generated_from_trainer | 
					
						
						|  | datasets: | 
					
						
						|  | - ymoslem/wmt-da-human-evaluation | 
					
						
						|  | model-index: | 
					
						
						|  | - name: Quality Estimation for Machine Translation | 
					
						
						|  | results: | 
					
						
						|  | - task: | 
					
						
						|  | type: regression | 
					
						
						|  | dataset: | 
					
						
						|  | name: ymoslem/wmt-da-human-evaluation | 
					
						
						|  | type: QE | 
					
						
						|  | metrics: | 
					
						
						|  | - name: Pearson Correlation | 
					
						
						|  | type: Pearson | 
					
						
						|  | value: 0.4458 | 
					
						
						|  | - name: Mean Absolute Error | 
					
						
						|  | type: MAE | 
					
						
						|  | value: 0.1876 | 
					
						
						|  | - name: Root Mean Squared Error | 
					
						
						|  | type: RMSE | 
					
						
						|  | value: 0.2393 | 
					
						
						|  | - name: R-Squared | 
					
						
						|  | type: R2 | 
					
						
						|  | value: 0.1987 | 
					
						
						|  | metrics: | 
					
						
						|  | - pearsonr | 
					
						
						|  | - mae | 
					
						
						|  | - r_squared | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | # Quality Estimation for Machine Translation | 
					
						
						|  |  | 
					
						
						|  | This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) | 
					
						
						|  | on the [ymoslem/wmt-da-human-evaluation](https://huggingface.co/ymoslem/wmt-da-human-evaluation) dataset. | 
					
						
						|  |  | 
					
						
						|  | It achieves the following results on the evaluation set: | 
					
						
						|  | - Loss: 0.0564 | 
					
						
						|  |  | 
					
						
						|  | ## Model description | 
					
						
						|  |  | 
					
						
						|  | This model is for reference-free quality estimation (QE) of machine translation (MT) systems. | 
					
						
						|  |  | 
					
						
						|  | ## Training procedure | 
					
						
						|  |  | 
					
						
						|  | ### Training hyperparameters | 
					
						
						|  |  | 
					
						
						|  | This model uses the full maximum length of the tokenizer, which is 8192. | 
					
						
						|  | The version with 512 maximum length can be found here [ymoslem/ModernBERT-large-qe-maxlen512-v1](https://huggingface.co/ymoslem/ModernBERT-large-qe-maxlen512-v1) | 
					
						
						|  |  | 
					
						
						|  | The following hyperparameters were used during training: | 
					
						
						|  | - learning_rate: 8e-05 | 
					
						
						|  | - train_batch_size: 128 | 
					
						
						|  | - eval_batch_size: 128 | 
					
						
						|  | - seed: 42 | 
					
						
						|  | - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments | 
					
						
						|  | - lr_scheduler_type: linear | 
					
						
						|  | - training_steps: 10000 | 
					
						
						|  |  | 
					
						
						|  | ### Training results | 
					
						
						|  |  | 
					
						
						|  | | Training Loss | Epoch  | Step  | Validation Loss | | 
					
						
						|  | |:-------------:|:------:|:-----:|:---------------:| | 
					
						
						|  | | 0.0631        | 0.1004 | 1000  | 0.0674          | | 
					
						
						|  | | 0.0614        | 0.2007 | 2000  | 0.0599          | | 
					
						
						|  | | 0.0578        | 0.3011 | 3000  | 0.0585          | | 
					
						
						|  | | 0.0585        | 0.4015 | 4000  | 0.0579          | | 
					
						
						|  | | 0.0568        | 0.5019 | 5000  | 0.0570          | | 
					
						
						|  | | 0.057         | 0.6022 | 6000  | 0.0568          | | 
					
						
						|  | | 0.0579        | 0.7026 | 7000  | 0.0567          | | 
					
						
						|  | | 0.0573        | 0.8030 | 8000  | 0.0565          | | 
					
						
						|  | | 0.0568        | 0.9033 | 9000  | 0.0564          | | 
					
						
						|  | | 0.0571        | 1.0037 | 10000 | 0.0564          | | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ### Framework versions | 
					
						
						|  |  | 
					
						
						|  | - Transformers 4.48.0 | 
					
						
						|  | - Pytorch 2.4.1+cu124 | 
					
						
						|  | - Datasets 3.2.0 | 
					
						
						|  | - Tokenizers 0.21.0 | 
					
						
						|  |  | 
					
						
						|  | ## Inference | 
					
						
						|  |  | 
					
						
						|  | 1. Install the required libraries. | 
					
						
						|  |  | 
					
						
						|  | ```bash | 
					
						
						|  | pip3 install --upgrade datasets accelerate transformers | 
					
						
						|  | pip3 install --upgrade flash_attn triton | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | 2. Load the test dataset. | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from datasets import load_dataset | 
					
						
						|  |  | 
					
						
						|  | test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation", | 
					
						
						|  | split="test", | 
					
						
						|  | trust_remote_code=True | 
					
						
						|  | ) | 
					
						
						|  | print(test_dataset) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | 3. Load the model and tokenizer: | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from transformers import AutoModelForSequenceClassification, AutoTokenizer | 
					
						
						|  | import torch | 
					
						
						|  |  | 
					
						
						|  | # Load the fine-tuned model and tokenizer | 
					
						
						|  | model_name = "ymoslem/ModernBERT-large-qe-v1" | 
					
						
						|  | model = AutoModelForSequenceClassification.from_pretrained( | 
					
						
						|  | model_name, | 
					
						
						|  | device_map="auto", | 
					
						
						|  | torch_dtype=torch.bfloat16, | 
					
						
						|  | attn_implementation="flash_attention_2", | 
					
						
						|  | ) | 
					
						
						|  | tokenizer = AutoTokenizer.from_pretrained(model_name) | 
					
						
						|  |  | 
					
						
						|  | # Move model to GPU if available | 
					
						
						|  | device = "cuda" if torch.cuda.is_available() else "cpu" | 
					
						
						|  | model.to(device) | 
					
						
						|  | model.eval() | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | 4. Prepare the dataset. Each source segment `src` and target segment `tgt` are separated by the `sep_token`, which is `'</s>'` for ModernBERT. | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | sep_token = tokenizer.sep_token | 
					
						
						|  | input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])] | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | 5. Generate predictions. | 
					
						
						|  |  | 
					
						
						|  | If you print `model.config.problem_type`, the output is `regression`. | 
					
						
						|  | Still, you can use the "text-classification" pipeline as follows (cf. [pipeline documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextClassificationPipeline)): | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from transformers import pipeline | 
					
						
						|  |  | 
					
						
						|  | classifier = pipeline("text-classification", | 
					
						
						|  | model=model_name, | 
					
						
						|  | tokenizer=tokenizer, | 
					
						
						|  | device=0, | 
					
						
						|  | ) | 
					
						
						|  |  | 
					
						
						|  | predictions = classifier(input_test_texts, | 
					
						
						|  | batch_size=128, | 
					
						
						|  | truncation=True, | 
					
						
						|  | padding="max_length", | 
					
						
						|  | max_length=tokenizer.model_max_length, | 
					
						
						|  | ) | 
					
						
						|  | predictions = [prediction["score"] for prediction in predictions] | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | Alternatively, you can use an elaborate version of the code, which is slightly faster and provides more control. | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from torch.utils.data import DataLoader | 
					
						
						|  | import torch | 
					
						
						|  | from tqdm.auto import tqdm | 
					
						
						|  |  | 
					
						
						|  | # Tokenization function | 
					
						
						|  | def process_batch(batch, tokenizer, device): | 
					
						
						|  | sep_token = tokenizer.sep_token | 
					
						
						|  | input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])] | 
					
						
						|  | tokens = tokenizer(input_texts, | 
					
						
						|  | truncation=True, | 
					
						
						|  | padding="max_length", | 
					
						
						|  | max_length=tokenizer.model_max_length, | 
					
						
						|  | return_tensors="pt", | 
					
						
						|  | ).to(device) | 
					
						
						|  | return tokens | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | # Create a DataLoader for batching | 
					
						
						|  | test_dataloader = DataLoader(test_dataset, | 
					
						
						|  | batch_size=128,   # Adjust batch size as needed | 
					
						
						|  | shuffle=False) | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | # List to store all predictions | 
					
						
						|  | predictions = [] | 
					
						
						|  |  | 
					
						
						|  | with torch.no_grad(): | 
					
						
						|  | for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"): | 
					
						
						|  |  | 
					
						
						|  | tokens = process_batch(batch, tokenizer, device) | 
					
						
						|  |  | 
					
						
						|  | # Forward pass: Generate model's logits | 
					
						
						|  | outputs = model(**tokens) | 
					
						
						|  |  | 
					
						
						|  | # Get logits (predictions) | 
					
						
						|  | logits = outputs.logits | 
					
						
						|  |  | 
					
						
						|  | # Extract the regression predicted values | 
					
						
						|  | batch_predictions = logits.squeeze() | 
					
						
						|  |  | 
					
						
						|  | # Extend the list with the predictions | 
					
						
						|  | predictions.extend(batch_predictions.tolist()) | 
					
						
						|  | ``` |