DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)

This repository contains a token-classification model trained on the DAMASHA-MAS benchmark, introduced in:

DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

The model aims to segment mixed human–AI text at token level – i.e., decide for each token whether it was written by a human or an LLM, even under syntactic adversarial attacks.

  • Base encoders:
  • Architecture (high level): RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the Info-Mask gating mechanism from the paper.
  • Task: Token classification (binary authorship: human vs AI).
  • Language: English
  • License (this model): MIT
  • Training data license: CC-BY-4.0 via the DAMASHA dataset.

If you use this model, please also cite the DAMASHA paper and dataset (see Citation section).


1. Model Highlights

  • Fine-grained mixed-authorship detection
    Predicts authorship per token, allowing reconstruction of human vs AI spans in long documents.

  • Adversarially robust
    Trained and evaluated on syntactically attacked texts (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).

  • Human-interpretable Info-Mask
    The architecture incorporates stylometric features (perplexity, POS density, punctuation density, lexical diversity, readability) via an Info-Mask module that gates token representations in an interpretable way.

  • Strong reported performance (from the paper)
    On DAMASHA-MAS, the RMC* model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:

    • Token-level: Accuracy / Precision / Recall / F1 ≈ 0.98
    • Span-level (strict): SBDA ≈ 0.45, SegPre ≈ 0.41
    • Span-level (relaxed IoU ≥ 0.5): ≈ 0.82

⚠️ The exact numbers for this specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC*).


2. Intended Use

What this model is for

  • Research on human–AI co-authorship

    • Studying where LLMs “take over” in mixed texts.
    • Analysing robustness of detectors under adversarial perturbations.
  • Tooling / applications (with human oversight)

    • Assisting editors, educators, or moderators to highlight suspicious spans rather than making final decisions.
    • Exploring interpretability overlays (e.g., heatmaps over tokens) when combined with Info-Mask outputs.

What this model is not for

  • Automated “cheating detector” / plagiarism court.
  • High-stakes decisions affecting people’s livelihood, grades, or reputation without human review.
  • Non-English or heavily code-mixed text (training data is English-centric).

Use this model as a signal, not a judge.


3. Data: DAMASHA-MAS

The model is trained on the MAS benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:

3.1 What’s in MAS?

MAS consists of mixed human–AI texts with explicit span tags:

  • Human text comes from several corpora for domain diversity, including:

    • Reddit (M4-Reddit)
    • Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
    • News summaries (XSUM)
    • Wikipedia (M4-Wiki, MAGE-SQuAD)
    • ArXiv abstracts (MAGE-SciGen)
    • QA texts (MAGE-ELI5)
  • AI text is generated by multiple modern LLMs:

    • DeepSeek-V3-671B (open-source)
    • GPT-4o, GPT-4.1, GPT-4.1-mini (closed-source)

3.2 Span tagging

Authorship is marked using explicit tags around AI spans:

  • <AI_Start></AI_End> denote AI-generated segments within otherwise human text.
  • The dataset stores text in a hybrid_text column, plus metadata such as has_pair, and adversarial variants include attack_name, tag_count, and attacked_text.
  • Tags are sentence-level in annotation, but the model is trained to output token-level predictions for finer segmentation.

During training, these tags are converted into token labels (2 labels total; see config.id2label in the model files).

3.3 Adversarial attacks

MAS includes multiple syntactic attacks applied to the mixed text:

  • Misspelling
  • Unicode character substitution
  • Invisible characters
  • Punctuation substitution
  • Upper/lower case swapping
  • All-mixed combinations of the above

These perturbations make tokenization brittle and test robustness of detectors in realistic settings.


4. Model Architecture & Training

4.1 Architecture (conceptual)

The model follows the Info-Mask RMC* architecture described in the DAMASHA paper:

  1. Dual encoders
    • RoBERTa-base and ModernBERT-base encode the same input sequence.
  2. Feature fusion
    • Hidden states from both encoders are fused into a shared representation.
  3. Stylometric Info-Mask
    • Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a scalar mask per token.
    • This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}
  4. Sequence model + CRF
    • A BiGRU layer captures sequential dependencies, followed by a CRF layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}

4.2 Training setup (from the paper)

Key hyperparameters used for the Info-Mask models on MAS:

  • Number of labels: 2
  • Max sequence length: 512
  • Batch size: 64
  • Epochs: 5
  • Optimizer: AdamW (with cosine annealing LR schedule)
  • Weight decay: 0.01
  • Gradient clipping: 1.0
  • Dropout: Dynamic 0.1–0.3 (initial 0.1)
  • Warmup ratio: 0.1
  • Early stopping patience: 2

Hardware & compute (as reported):

  • AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
  • ≈ 400 GPU hours for experiments.

The exact training script used for this checkpoint is available in the project GitHub:
https://github.com/saitejalekkala33/DAMASHA



license: mit

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for saiteja33/DAMASHA-RMC

Finetuned
(2062)
this model

Dataset used to train saiteja33/DAMASHA-RMC