DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)
This repository contains a token-classification model trained on the DAMASHA-MAS benchmark, introduced in:
DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution
The model aims to segment mixed human–AI text at token level – i.e., decide for each token whether it was written by a human or an LLM, even under syntactic adversarial attacks.
- Base encoders:
- Architecture (high level): RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the Info-Mask gating mechanism from the paper.
- Task: Token classification (binary authorship: human vs AI).
- Language: English
- License (this model): MIT
- Training data license: CC-BY-4.0 via the DAMASHA dataset.
If you use this model, please also cite the DAMASHA paper and dataset (see Citation section).
1. Model Highlights
Fine-grained mixed-authorship detection
Predicts authorship per token, allowing reconstruction of human vs AI spans in long documents.Adversarially robust
Trained and evaluated on syntactically attacked texts (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).Human-interpretable Info-Mask
The architecture incorporates stylometric features (perplexity, POS density, punctuation density, lexical diversity, readability) via an Info-Mask module that gates token representations in an interpretable way.Strong reported performance (from the paper)
On DAMASHA-MAS, the RMC* model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:- Token-level: Accuracy / Precision / Recall / F1 ≈ 0.98
- Span-level (strict): SBDA ≈ 0.45, SegPre ≈ 0.41
- Span-level (relaxed IoU ≥ 0.5): ≈ 0.82
⚠️ The exact numbers for this specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC*).
2. Intended Use
What this model is for
Research on human–AI co-authorship
- Studying where LLMs “take over” in mixed texts.
- Analysing robustness of detectors under adversarial perturbations.
Tooling / applications (with human oversight)
- Assisting editors, educators, or moderators to highlight suspicious spans rather than making final decisions.
- Exploring interpretability overlays (e.g., heatmaps over tokens) when combined with Info-Mask outputs.
What this model is not for
- Automated “cheating detector” / plagiarism court.
- High-stakes decisions affecting people’s livelihood, grades, or reputation without human review.
- Non-English or heavily code-mixed text (training data is English-centric).
Use this model as a signal, not a judge.
3. Data: DAMASHA-MAS
The model is trained on the MAS benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:
- Dataset:
saiteja33/DAMASHA
3.1 What’s in MAS?
MAS consists of mixed human–AI texts with explicit span tags:
Human text comes from several corpora for domain diversity, including:
- Reddit (M4-Reddit)
- Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
- News summaries (XSUM)
- Wikipedia (M4-Wiki, MAGE-SQuAD)
- ArXiv abstracts (MAGE-SciGen)
- QA texts (MAGE-ELI5)
AI text is generated by multiple modern LLMs:
- DeepSeek-V3-671B (open-source)
- GPT-4o, GPT-4.1, GPT-4.1-mini (closed-source)
3.2 Span tagging
Authorship is marked using explicit tags around AI spans:
<AI_Start>…</AI_End>denote AI-generated segments within otherwise human text.- The dataset stores text in a
hybrid_textcolumn, plus metadata such ashas_pair, and adversarial variants includeattack_name,tag_count, andattacked_text. - Tags are sentence-level in annotation, but the model is trained to output token-level predictions for finer segmentation.
During training, these tags are converted into token labels (2 labels total; see
config.id2labelin the model files).
3.3 Adversarial attacks
MAS includes multiple syntactic attacks applied to the mixed text:
- Misspelling
- Unicode character substitution
- Invisible characters
- Punctuation substitution
- Upper/lower case swapping
- All-mixed combinations of the above
These perturbations make tokenization brittle and test robustness of detectors in realistic settings.
4. Model Architecture & Training
4.1 Architecture (conceptual)
The model follows the Info-Mask RMC* architecture described in the DAMASHA paper:
- Dual encoders
- RoBERTa-base and ModernBERT-base encode the same input sequence.
- Feature fusion
- Hidden states from both encoders are fused into a shared representation.
- Stylometric Info-Mask
- Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a scalar mask per token.
- This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}
- Sequence model + CRF
- A BiGRU layer captures sequential dependencies, followed by a CRF layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}
4.2 Training setup (from the paper)
Key hyperparameters used for the Info-Mask models on MAS:
- Number of labels: 2
- Max sequence length: 512
- Batch size: 64
- Epochs: 5
- Optimizer: AdamW (with cosine annealing LR schedule)
- Weight decay: 0.01
- Gradient clipping: 1.0
- Dropout: Dynamic 0.1–0.3 (initial 0.1)
- Warmup ratio: 0.1
- Early stopping patience: 2
Hardware & compute (as reported):
- AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
- ≈ 400 GPU hours for experiments.
The exact training script used for this checkpoint is available in the project GitHub:
https://github.com/saitejalekkala33/DAMASHA
license: mit
Model tree for saiteja33/DAMASHA-RMC
Base model
FacebookAI/roberta-base