---
license: mit
datasets:
- saiteja33/DAMASHA
language:
- en
base_model:
- FacebookAI/roberta-base
- answerdotai/ModernBERT-base
pipeline_tag: token-classification
---

# DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)

This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in:

> **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution**  

The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**.

- **Base encoders:**  
  - [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) 
  - [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) 
- **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper.  
- **Task:** Token classification (binary authorship: human vs AI).  
- **Language:** English  
- **License (this model):** MIT  
- **Training data license:** CC-BY-4.0 via the DAMASHA dataset.  

If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section).

---

## 1. Model Highlights

- **Fine-grained mixed-authorship detection**  
  Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents.  

- **Adversarially robust**  
  Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).  

- **Human-interpretable Info-Mask**  
  The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way. 

- **Strong reported performance (from the paper)**  
  On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:  
  - **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98**  
  - **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41**  
  - **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82**  

> ⚠️ The exact numbers for *this* specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC\*).

---

## 2. Intended Use

### What this model is for

- **Research on human–AI co-authorship**  
  - Studying where LLMs “take over” in mixed texts.  
  - Analysing robustness of detectors under adversarial perturbations.

- **Tooling / applications (with human oversight)**  
  - Assisting editors, educators, or moderators to **highlight suspicious spans** rather than making final decisions.  
  - Exploring **interpretability overlays** (e.g., heatmaps over tokens) when combined with Info-Mask outputs.

### What this model is *not* for

- Automated “cheating detector” / plagiarism court.  
- High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**.  
- Non-English or heavily code-mixed text (training data is English-centric). 

Use this model as a **signal**, not a judge.

---

## 3. Data: DAMASHA-MAS

The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:

- **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA) 

### 3.1 What’s in MAS?

MAS consists of **mixed human–AI texts with explicit span tags**:  

- Human text comes from several corpora for **domain diversity**, including:  
  - Reddit (M4-Reddit)  
  - Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)  
  - News summaries (XSUM)  
  - Wikipedia (M4-Wiki, MAGE-SQuAD)  
  - ArXiv abstracts (MAGE-SciGen)  
  - QA texts (MAGE-ELI5)

- AI text is generated by multiple modern LLMs:  
  - **DeepSeek-V3-671B** (open-source)  
  - **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source)  

### 3.2 Span tagging

Authorship is marked using **explicit tags** around AI spans:  

- `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text.  
- The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.  
- Tags are sentence-level in annotation, but the model is trained to output **token-level** predictions for finer segmentation.

> During training, these tags are converted into **token labels** (2 labels total; see `config.id2label` in the model files).

### 3.3 Adversarial attacks

MAS includes multiple **syntactic attacks** applied to the mixed text:  

- Misspelling  
- Unicode character substitution  
- Invisible characters  
- Punctuation substitution  
- Upper/lower case swapping  
- All-mixed combinations of the above  

These perturbations make tokenization brittle and test robustness of detectors in realistic settings.

---

## 4. Model Architecture & Training

### 4.1 Architecture (conceptual)

The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper:  

1. **Dual encoders**  
   - RoBERTa-base and ModernBERT-base encode the same input sequence.  
2. **Feature fusion**  
   - Hidden states from both encoders are fused into a shared representation.  
3. **Stylometric Info-Mask**  
   - Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a **scalar mask per token**.  
   - This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}  
4. **Sequence model + CRF**  
   - A BiGRU layer captures sequential dependencies, followed by a **CRF** layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}  

### 4.2 Training setup (from the paper)

Key hyperparameters used for the Info-Mask models on MAS:  

- **Number of labels:** 2  
- **Max sequence length:** 512  
- **Batch size:** 64  
- **Epochs:** 5  
- **Optimizer:** AdamW (with cosine annealing LR schedule)  
- **Weight decay:** 0.01  
- **Gradient clipping:** 1.0  
- **Dropout:** Dynamic 0.1–0.3 (initial 0.1)  
- **Warmup ratio:** 0.1  
- **Early stopping patience:** 2  

**Hardware & compute** (as reported): 

- AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04  
- ≈ 400 GPU hours for experiments.

> The exact training script used for this checkpoint is available in the project GitHub:  
> <https://github.com/saitejalekkala33/DAMASHA>

---

---
license: mit
---