--- license: mit tags: - quote-attribution - speaker-identification - dialogue-attribution - nlp - transformers - bert language: - en datasets: - aNameNobodyChose/quote-speaker-attribution --- # ๐Ÿ—ฃ๏ธ QuoteCaster: Speaker-Aware Quote Encoder **QuoteCaster** is a fine-tuned BERT-based model designed to encode dialogue quotes along with their surrounding context in order to **identify or group quotes by speaker** โ€” even in stories the model has never seen before. This encoder powers unsupervised or few-shot quote attribution by mapping similar speaking styles (with context) to nearby points in embedding space. Perfect for clustering or nearest-neighbor speaker inference tasks. --- ## ๐Ÿ“ฆ Model Details - **Base model**: `bert-base-uncased` - **Trained with**: Triplet Margin Loss - **Objective**: Pull quotes from the same speaker together, push different ones apart - **Input**: `context [SEP] quote` - **Output**: `[CLS]` embedding as a 768-dimensional vector --- ## ๐Ÿ“Š Use Case QuoteCaster is ideal for: - ๐Ÿง  Clustering quotes by speaker using KMeans or Agglomerative Clustering - ๐Ÿ” Zero-shot speaker inference on unseen stories - ๐Ÿงช Dialogue structure analysis in novels, scripts, or plays --- ## ๐Ÿš€ Example: Inference with QuoteCaster ```python from transformers import AutoModel, AutoTokenizer # Load fine-tuned encoder model = AutoModel.from_pretrained("aNameNobodyChose/quote-caster-encoder") tokenizer = AutoTokenizer.from_pretrained("aNameNobodyChose/quote-caster-encoder") # Encode a quote with its surrounding context def encode_quote(context, quote): text = f"{context} [SEP] {quote}" inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) outputs = model(**inputs) return outputs.last_hidden_state[:, 0, :] # [CLS] token