--- license: mit language: - en base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification tags: - multilabel - framing - entman - media-bias - news - zero-shot - bart-large-mnli metrics: - accuracy - f1 - precision - recall --- # BERT Framing Classifier (Entman Multilabel) v1 This model is a multilabel classifier based on `bert-base-uncased`, fine-tuned to identify **framing functions** in news articles, inspired by Robert Entman's framing theory. The labels correspond to the four framing functions: - **Define problems** - **Diagnose causes** - **Make moral judgments** - **Suggest remedies** ## πŸ” Use Case This model is designed for media studies researchers, journalists, or analysts studying **media framing**, **bias**, and **narrative patterns** in English-language news coverage. It is especially useful for: - News framing analysis in media studies. - Detecting narrative intent in political discourse. - Multilabel classification of complex textual claims. Each label is treated as an independent binary classification task (multi-label classification). ## 🧠 Model Details - Base model: `bert-base-uncased` - Framework: πŸ€— Transformers with PyTorch - Loss Function: `BCEWithLogitsLoss` with class weights - Label imbalance handled using positive weights and stratified multi-label split ## πŸ“Š Metrics Evaluated on a stratified test set using: - Accuracy - F1-score (macro) - Precision (macro) - Recall (macro) - ROC-AUC per class Thresholds for prediction were tuned per label for optimal F1-score. ### πŸ“Š Objective This experiment aimed to optimize the performance of a BERT-based sequence classification model for framing analysis using the Optuna hyperparameter tuning framework. The goal was to maximize the macro F1-score, which is a balanced metric for multi-label classification involving class imbalance. ### βš™οΈ Hyperparameters Tuned - `learning_rate`: float, explored between ~1e-5 to ~5e-5 - `weight_decay`: float, various values tested from ~0.02 to ~0.25 - `num_train_epochs`: integer, values tried between 2 and 4 ## πŸ† Best Trial Summary - **F1 Macro**: **0.8546** - **Accuracy**: 0.5846 - **Precision Macro**: 0.8634 - **Recall Macro**: 0.8486 - **Best Hyperparameters**: - `learning_rate`: **4.62e-5** - `weight_decay`: **0.2275** - `num_train_epochs`: **4** ## πŸ“ˆ Best Trial Training Metrics | Epoch | Training Loss | Validation Loss | Accuracy | F1 Macro | Precision Macro | Recall Macro | |-------|----------------|------------------|----------|----------|------------------|----------------| | 1 | 0.4155 | 0.4499 | 0.3466 | 0.6998 | 0.8443 | 0.6265 | | 2 | 0.3613 | 0.3414 | 0.4764 | 0.7862 | 0.8725 | 0.7266 | | 3 | 0.2011 | 0.3179 | 0.5649 | 0.8495 | 0.8489 | 0.8506 | | 4 | 0.1416 | 0.3508 | 0.5846 | **0.8546** | 0.8634 | 0.8486 | ![ROC Curve](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67aa9afb0b5d8d7d5e4e207a%2FjEq26g_ksRWyKLK2Yxsur.png) ## πŸ“ Notes - All models started from the `bert-base-uncased` checkpoint. - Classification head weights were randomly initialized (`classifier.weight`, `classifier.bias`). - Full training was conducted for each trial; early stopping was **not** used. ## πŸ§ͺ How to Use ```python from transformers import BertTokenizer, BertForSequenceClassification import torch tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") model = BertForSequenceClassification.from_pretrained("nurdyansa/bert-framing-entman-multilabel-v1") label_cols = ["define_problem", "diagnose_cause", "moral_judgment", "suggest_remedy"] def predict_framing(text, thresholds=None): model.eval() inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=128) with torch.no_grad(): outputs = model(**inputs) probs = torch.sigmoid(outputs.logits).squeeze() preds = (probs > torch.tensor(thresholds or [0.5]*4)).int().tolist() return {label_cols[i]: bool(preds[i]) for i in range(len(label_cols))} # Example text = "The government failed to address the root cause of the crisis." print(predict_framing(text)) ``` ## πŸ”§ Configuration ```python repo_name = "nurdyansa/bert-framing-entman-multilabel-v1" ``` ## πŸ“ Dataset Balanced dataset of English-language news articles annotated with 4 Entman-style framing labels: - Define Problem - Diagnose Cause - Moral Judgment - Suggest Remedy ## πŸš€ Training Details - Dataset size: 4,000+ english news articles - Optimized using Optuna (10 trials) - Training framework: Hugging Face Transformers (PyTorch) - Evaluation strategy: Per epoch - Final model selected based on best macro F1-score --- Model by [nurdyansa](https://huggingface.co/nurdyansa) ## πŸ“š Citation If you use this model in your research or application, please cite it as: ```bibtex @misc{nurdyansa_2025, author = { Nurdyansa }, title = { bert-framing-entman-multilabel-v1 (Revision 057747b) }, year = 2025, url = { https://huggingface.co/nurdyansa/bert-framing-entman-multilabel-v1 }, doi = { 10.57967/hf/5392 }, publisher = { Hugging Face } } ``` ## 🀝 Contributing I'm very welcome to invite researchers and practitioners to collaborate in enhancing this model’s precision. Please contribute by: - Providing more annotated data. - Improving label consistency or adding nuance. - Suggesting improvements to model architecture or training methods. If you are interested in collaborating, sharing insights, or further developing this model, feel free to reach out: πŸ“§ Email: nurdyansa@gmail.com