---
license: mit
tags:
  - quote-attribution
  - speaker-identification
  - dialogue-attribution
  - nlp
  - transformers
  - bert
language:
  - en
datasets:
  - aNameNobodyChose/quote-speaker-attribution
---

# 🗣️ QuoteCaster: Speaker-Aware Quote Encoder

**QuoteCaster** is a fine-tuned BERT-based model designed to encode dialogue quotes along with their surrounding context in order to **identify or group quotes by speaker** — even in stories the model has never seen before.

This encoder powers unsupervised or few-shot quote attribution by mapping similar speaking styles (with context) to nearby points in embedding space. Perfect for clustering or nearest-neighbor speaker inference tasks.

---

## 📦 Model Details

- **Base model**: `bert-base-uncased`
- **Trained with**: Triplet Margin Loss
- **Objective**: Pull quotes from the same speaker together, push different ones apart
- **Input**: `context [SEP] quote`
- **Output**: `[CLS]` embedding as a 768-dimensional vector

---

## 📊 Use Case

QuoteCaster is ideal for:

- 🧠 Clustering quotes by speaker using KMeans or Agglomerative Clustering
- 🔍 Zero-shot speaker inference on unseen stories
- 🧪 Dialogue structure analysis in novels, scripts, or plays

---

## 🚀 Example: Inference with QuoteCaster

```python
from transformers import AutoModel, AutoTokenizer

# Load fine-tuned encoder
model = AutoModel.from_pretrained("aNameNobodyChose/quote-caster-encoder")
tokenizer = AutoTokenizer.from_pretrained("aNameNobodyChose/quote-caster-encoder")

# Encode a quote with its surrounding context
def encode_quote(context, quote):
    text = f"{context} [SEP] {quote}"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :]  # [CLS] token