Ukrainian/Russian Manipulation Detector - XLM-RoBERTa

Model Description

This model detects propaganda and manipulation techniques in Ukrainian and Russian text. It is a fine-tuned version of FacebookAI/xlm-roberta-large trained on a bilingual subset of the UNLP 2025 Shared Task dataset for multi-label classification of manipulation techniques.

Its multilingual architecture makes it effective at understanding nuances in both Ukrainian and Russian, including code-mixed contexts.

Task: Manipulation Technique Classification

The model performs multi-label text classification, identifying 5 major manipulation categories. A single text can contain multiple techniques.

Manipulation Categories

Loaded Language: The use of words and phrases with a strong emotional connotation (positive or negative) to influence the audience.
Glittering Generalities: Exploitation of people's positive attitude towards abstract concepts such as “justice,” “freedom,” “democracy,” “patriotism,” “peace,” “happiness,” “love,” “truth,” “order,” etc. These words and phrases are intended to provoke strong emotional reactions and feelings of solidarity without providing specific information or arguments.
Euphoria: Using an event that causes euphoria or a feeling of happiness, or a positive event to boost morale. This manipulation is often used to mobilize the population.
Appeal to Fear: The misuse of fear (often based on stereotypes or prejudices) to support a particular proposal.
FUD (Fear, Uncertainty, Doubt): Presenting information in a way that sows uncertainty and doubt, causing fear. This technique is a subtype of the appeal to fear.
Bandwagon/Appeal to People: An attempt to persuade the audience to join and take action because “others are doing the same thing.”
Thought-Terminating Cliché: Commonly used phrases that mitigate cognitive dissonance and block critical thinking.
Whataboutism: Discrediting the opponent's position by accusing them of hypocrisy without directly refuting their arguments.
Cherry Picking: Selective use of data or facts that support a hypothesis while ignoring counterarguments.
Straw Man: Distorting the opponent's position by replacing it with a weaker or outwardly similar one and refuting it instead.

Training Data

The model was trained on the dataset from the UNLP 2025 Shared Task on manipulation technique classification.

Dataset: UNLP 2025 Techniques Classification
Source Texts: Ukrainian and Russian texts from a larger multilingual dataset.
Task: Multi-label classification.

Training Configuration

The model was fine-tuned using the following hyperparameters:

Parameter	Value
Base Model	`FacebookAI/xlm-roberta-large`
Learning Rate	`2e-5`
Train Batch Size	`16`
Eval Batch Size	`32`
Epochs	`10`
Max Sequence Length	`512`
Optimizer	AdamW
Loss Function	`BCEWithLogitsLoss` (with class weights)

Usage

Installation

First, install the necessary libraries:

pip install transformers torch sentencepiece

Quick Start

Here is how to use the model to classify a single piece of text:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Define model and label names
model_name = "olehmell/ukr-rus-manipulation-detector-xlm-roberta" # Hypothetical model name
labels = [
    'emotional_manipulation', 
    'fear_appeals', 
    'bandwagon_effect', 
    'selective_truth', 
    'cliche'
]

# Load pretrained model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prepare text (can be Ukrainian or Russian)
text = "Все эксперты уже давно это подтвердили, только вы не понимаете, что происходит на самом деле."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Get detected techniques
threshold = 0.5
detected_techniques = {}
for i, score in enumerate(predictions[0]):
    if score > threshold:
        detected_techniques[labels[i]] = f"{score:.2f}"

if detected_techniques:
    print("Detected techniques:")
    for technique, score in detected_techniques.items():
        print(f"- {technique} (Score: {score})")
else:
    print("No manipulation techniques detected.")

Performance

The model achieves the following performance on the evaluation set:

Metric	Value
F1 Macro	0.44
F1 Micro	TBD
Hamming Loss	TBD

Limitations

Language Specificity: The model is optimized for Ukrainian and Russian. Performance on other languages is not guaranteed.
Domain Sensitivity: Trained primarily on political and social media discourse, its performance may vary on other text domains (e.g., scientific, literary).
Context Length: The model is limited to texts up to 512 tokens. Longer documents must be chunked or truncated.
Class Imbalance: Some manipulation techniques are underrepresented in the training data, which may affect their detection accuracy.

Ethical Considerations

Purpose: This model is intended as a tool to support media literacy and critical thinking, not as an arbiter of truth.
Human Oversight: Model outputs should be interpreted with human judgment and a full understanding of the context. It should not be used to automatically censor content.
Potential Biases: The model may reflect biases present in the training data.

Citation

If you use this model in your research, please cite the following:

@misc{ukrainian-russian-manipulation-xlm-roberta-2025,
  author = {Oleh Mell},
  title = {Ukrainian/Russian Manipulation Detector - XLM-RoBERTa},
  year = {2025},
  publisher = {Hugging Face},
  url = {[https://huggingface.co/olehmell/ukr-rus-manipulation-detector-xlm-roberta](https://huggingface.co/olehmell/ukr-rus-manipulation-detector-xlm-roberta)}
}

@inproceedings{unlp2025shared,
  title={UNLP 2025 Shared Task on Techniques Classification},
  author={UNLP Workshop Organizers},
  booktitle={UNLP 2025 Workshop},
  year={2025},
  url={[https://github.com/unlp-workshop/unlp-2025-shared-task](https://github.com/unlp-workshop/unlp-2025-shared-task)}
}

License

This model is licensed under the Apache 2.0 License.

Acknowledgments

The organizers of the UNLP 2025 Workshop for providing the dataset.

Downloads last month: 51

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for olehmell/xlm-roberta-posts-manipulation-classifier

Base model

FacebookAI/xlm-roberta-large

Finetuned

(742)

this model