Latin Intertextuality Embedding Model
This model is a fine-tuned version of SPhilBerta for generating embeddings of Latin texts to detect intertextual relationships between Jerome (Hieronymus) and other classical authors. This model is intended to integrate with the LociSimiles Python package for Latin intertextuality workflows: https://pypi.org/project/locisimiles/.
Model Description
- Task: Sentence embedding for detecting intertextual links between classical Latin authors
- Model type: Sentence Transformer (Embedding Model)
- Base model: SPhilBerta
- Max input tokens: 512
- Language: Latin
- License: Apache 2.0
Usage
This model generates dense vector embeddings for Latin text that can be used for semantic similarity tasks, particularly for detecting intertextual relationships. Important: This model was trained with prompts and should be used with the appropriate prompt names for optimal performance.
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
import numpy as np
# Load model
model = SentenceTransformer("julian-schelb/SPhilBerta-emb-lat-intertext-v1")
# Example: Jerome text and candidates (1 positive match, 2 unrelated)
queries = [
"omnia fert aetas, animum quoque; saepe ego longos cantando puerum memini me condere soles."
]
candidates = [
"saepe ego longos cantando puerum memini me condere soles.", # Positive match (subset of Jerome)
"Gallia est omnis divisa in partes tres", # Unrelated (Caesar)
"in nova fert animus mutatas dicere formas" # Unrelated (Ovid)
]
# Generate embeddings using prompt names
query_embeddings = model.encode(queries, prompt_name="query")
candidate_embeddings = model.encode(candidates, prompt_name="match")
# Calculate cosine similarity matrix
cosine_similarity_matrix = cos_sim(query_embeddings, candidate_embeddings)
print("Cosine Similarity Matrix:")
print("Query vs [Positive_Match, Caesar, Ovid]")
print(f"{cosine_similarity_matrix[0].numpy()}")
print(f"Highest similarity: {cosine_similarity_matrix[0].max():.4f} (index: {cosine_similarity_matrix[0].argmax()})")
# Alternative: Manual cosine similarity calculation
query_embedding = model.encode(queries[0], prompt_name="query")
candidate_embedding = model.encode(candidates[0], prompt_name="match")
cosine_sim = np.dot(query_embedding, candidate_embedding) / (np.linalg.norm(query_embedding) * np.linalg.norm(candidate_embedding))
print(f"\nDirect cosine similarity with positive match: {cosine_sim:.4f}")
Prompts
This model was trained with the following prompts:
- Query texts: Use
prompt_name="query"(corresponds to "Query: " prefix) - Candidate texts: Use
prompt_name="match"(corresponds to "Candidate: " prefix)
For best results, always use the appropriate prompt names when encoding texts for similarity comparison.
Citation
TBD
- Downloads last month
- 80
Model tree for julian-schelb/SPhilBerta-emb-lat-intertext-v1
Base model
bowphs/SPhilBertaSpace using julian-schelb/SPhilBerta-emb-lat-intertext-v1 1
Collection including julian-schelb/SPhilBerta-emb-lat-intertext-v1
Evaluation results
- Recall@1 on Latin Intertextuality Datasetself-reported0.282
- Recall@5 on Latin Intertextuality Datasetself-reported0.367
- Recall@10 on Latin Intertextuality Datasetself-reported0.399
- Recall@20 on Latin Intertextuality Datasetself-reported0.452
- Recall@100 on Latin Intertextuality Datasetself-reported0.553
- Recall@1000 on Latin Intertextuality Datasetself-reported0.750