|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
base_model: EuroBERT/EuroBERT-210m |
|
|
base_model_relation: finetune |
|
|
tags: |
|
|
- eurobert |
|
|
- fine-tuned |
|
|
- transformers |
|
|
- pytorch |
|
|
- sequence-classification |
|
|
- multiclass |
|
|
- geopolitics |
|
|
- multilingual |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
- fr |
|
|
- es |
|
|
- it |
|
|
--- |
|
|
|
|
|
|
|
|
# EuroBERT Geopolitical Classifier (Multiclass) |
|
|
|
|
|
Fine-tuned EuroBERT/EuroBERT-210m for detecting and categorizing geopolitical themes in (European) news text. |
|
|
|
|
|
- **Task:** Sequence classification (single-label multiclass) |
|
|
- **Labels:** 11 geopolitical topics |
|
|
- **Intended use:** Topic categorization of news on geopolitical tensions (best performance on full article-level text) |
|
|
- **Languages:** English, German, French, Spanish, Italian |
|
|
- **Framework:** 🤗 Transformers (PyTorch) |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick start |
|
|
|
|
|
### Inference with `transformers` |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_id = "durrani95/eurobert-geopolitical-multiclass" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_id) |
|
|
|
|
|
|
|
|
texts = [ |
|
|
"Russia cut off gas supplies to Europe amid rising tensions.", |
|
|
"Terrorist activity has increased along the southern border.", |
|
|
"New sanctions were imposed on financial institutions.", |
|
|
"Talks at the UN Security Council failed to reach consensus.", |
|
|
"Tarrifs on soybeans are applied to pressure China into a deal with the US" , |
|
|
"Tom and Jerry have a fight! The mouse finally had enough.", |
|
|
] |
|
|
|
|
|
inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
probs = torch.softmax(logits, dim=1) |
|
|
|
|
|
for text, p in zip(texts, probs): |
|
|
label_id = int(p.argmax()) |
|
|
label = model.config.id2label[label_id] |
|
|
confidence = float(p[label_id]) |
|
|
print(f"{label:>28} {confidence:6.2%} | {text}") |
|
|
``` |
|
|
|
|
|
|
|
|
## Category Definitions |
|
|
|
|
|
| Category | Description | Example | |
|
|
|-----------|--------------|----------| |
|
|
| **war_military_conflict** | Armed conflicts, military operations, or war-related issues involving states or armed groups. | Russia’s invasion of Ukraine | |
|
|
| **terrorism_insurgency** | Terrorist attacks, counter-terrorism operations, or insurgent activity. | 9/11 attacks | |
|
|
| **cyber_warfare** | Cyberattacks or hacking by foreign states or international actors with strategic motives. | North Korea’s Sony hack | |
|
|
| **trade_disputes** | Tensions between states over trade policy, tariffs, or retaliation. | U.S.–China trade wars | |
|
|
| **financial_sanctions** | Economic penalties imposed by countries against targeted states, entities, or individuals. | U.S. sanctions on Iran’s banking sector | |
|
|
| **regional_disintegration** | Political developments that threaten the cohesion of existing regional entities. | Brexit | |
|
|
| **energy_resource_conflicts** | Disputes over energy access, distribution, or natural resource control. | OPEC oil disputes | |
|
|
| **global_governance** | Tensions involving international institutions or multilateral diplomacy. | NATO expansion | |
|
|
| **nuclear_proliferation** | Issues concerning the spread or control of nuclear weapons. | Iran nuclear deal | |
|
|
| **territorial_disputes** | Conflicts over land or maritime boundaries. | South China Sea tensions | |
|
|
| **non_geopol** | Texts without geopolitical relevance. | Domestic politics or economic updates | |
|
|
|
|
|
--- |
|
|
|
|
|
## Training & Configuration |
|
|
|
|
|
- **Base model:** `EuroBERT/EuroBERT-210m` |
|
|
- **Objective:** Cross-entropy (single-label multiclass) |
|
|
- **Number of labels:** 11 |
|
|
- **Data:** European news text labeled across geopolitical topics |
|
|
- **Hardware:** A100 GPU |
|
|
- **Epochs:** 1 |
|
|
- **Optimizer:** AdamW with linear scheduler |
|
|
|
|
|
### Training setup |
|
|
|
|
|
| Parameter | Value | |
|
|
|------------|--------| |
|
|
| Learning rate | 3e-5 | |
|
|
| Desired (effective) batch size | 64 | |
|
|
| Actual GPU batch size | 16 | |
|
|
| Gradient accumulation | 4 steps | |
|
|
| Weight decay | 1e-5 | |
|
|
| Betas | (0.9, 0.95) | |
|
|
| Epsilon | 1e-8 | |
|
|
| Max epochs | 1 | |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations & Risks |
|
|
|
|
|
- May be sensitive to domain shift (non-news, social media text) |
|
|
- The model predicts one dominant label per text; it is not multi-label. |
|
|
- Multilingual performance can vary across languages and registers |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## How to cite |
|
|
|
|
|
If you use this model, please cite this repository and the EuroBERT base model. |
|
|
|