DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)

This repository contains a token-classification model trained on the DAMASHA-MAS benchmark, introduced in:

DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

The model aims to segment mixed human–AI text at token level – i.e., decide for each token whether it was written by a human or an LLM, even under syntactic adversarial attacks.

Base encoders:
- FacebookAI/roberta-base
- answerdotai/ModernBERT-base
Architecture (high level): RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the Info-Mask gating mechanism from the paper.
Task: Token classification (binary authorship: human vs AI).
Language: English
License (this model): MIT
Training data license: CC-BY-4.0 via the DAMASHA dataset.

If you use this model, please also cite the DAMASHA paper and dataset (see Citation section).

1. Model Highlights

Fine-grained mixed-authorship detection
Predicts authorship per token, allowing reconstruction of human vs AI spans in long documents.
Adversarially robust
Trained and evaluated on syntactically attacked texts (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).
Human-interpretable Info-Mask
The architecture incorporates stylometric features (perplexity, POS density, punctuation density, lexical diversity, readability) via an Info-Mask module that gates token representations in an interpretable way.
Strong reported performance (from the paper)
On DAMASHA-MAS, the RMC* model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:
- Token-level: Accuracy / Precision / Recall / F1 ≈ 0.98
- Span-level (strict): SBDA ≈ 0.45, SegPre ≈ 0.41
- Span-level (relaxed IoU ≥ 0.5): ≈ 0.82

⚠️ The exact numbers for this specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC*).

2. Intended Use

What this model is for

Research on human–AI co-authorship
- Studying where LLMs “take over” in mixed texts.
- Analysing robustness of detectors under adversarial perturbations.
Tooling / applications (with human oversight)
- Assisting editors, educators, or moderators to highlight suspicious spans rather than making final decisions.
- Exploring interpretability overlays (e.g., heatmaps over tokens) when combined with Info-Mask outputs.

What this model is not for

Automated “cheating detector” / plagiarism court.
High-stakes decisions affecting people’s livelihood, grades, or reputation without human review.
Non-English or heavily code-mixed text (training data is English-centric).

Use this model as a signal, not a judge.

3. Data: DAMASHA-MAS

The model is trained on the MAS benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:

Dataset: saiteja33/DAMASHA

3.1 What’s in MAS?

MAS consists of mixed human–AI texts with explicit span tags:

Human text comes from several corpora for domain diversity, including:
- Reddit (M4-Reddit)
- Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
- News summaries (XSUM)
- Wikipedia (M4-Wiki, MAGE-SQuAD)
- ArXiv abstracts (MAGE-SciGen)
- QA texts (MAGE-ELI5)
AI text is generated by multiple modern LLMs:
- DeepSeek-V3-671B (open-source)
- GPT-4o, GPT-4.1, GPT-4.1-mini (closed-source)

3.2 Span tagging

Authorship is marked using explicit tags around AI spans:

<AI_Start> … </AI_End> denote AI-generated segments within otherwise human text.
The dataset stores text in a hybrid_text column, plus metadata such as has_pair, and adversarial variants include attack_name, tag_count, and attacked_text.
Tags are sentence-level in annotation, but the model is trained to output token-level predictions for finer segmentation.

During training, these tags are converted into token labels (2 labels total; see config.id2label in the model files).

3.3 Adversarial attacks

MAS includes multiple syntactic attacks applied to the mixed text:

Misspelling
Unicode character substitution
Invisible characters
Punctuation substitution
Upper/lower case swapping
All-mixed combinations of the above

These perturbations make tokenization brittle and test robustness of detectors in realistic settings.

4. Model Architecture & Training

4.1 Architecture (conceptual)

The model follows the Info-Mask RMC* architecture described in the DAMASHA paper:

Dual encoders
- RoBERTa-base and ModernBERT-base encode the same input sequence.
Feature fusion
- Hidden states from both encoders are fused into a shared representation.
Stylometric Info-Mask
- Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a scalar mask per token.
- This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}
Sequence model + CRF
- A BiGRU layer captures sequential dependencies, followed by a CRF layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}

4.2 Training setup (from the paper)

Key hyperparameters used for the Info-Mask models on MAS:

Number of labels: 2
Max sequence length: 512
Batch size: 64
Epochs: 5
Optimizer: AdamW (with cosine annealing LR schedule)
Weight decay: 0.01
Gradient clipping: 1.0
Dropout: Dynamic 0.1–0.3 (initial 0.1)
Warmup ratio: 0.1
Early stopping patience: 2

Hardware & compute (as reported):

AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
≈ 400 GPU hours for experiments.

The exact training script used for this checkpoint is available in the project GitHub:
https://github.com/saitejalekkala33/DAMASHA

license: mit

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for saiteja33/DAMASHA-RMC

Base model

FacebookAI/roberta-base

Finetuned

(2062)

this model

saiteja33
/

DAMASHA-RMC