πΉπ· Turkish Embedding Model (bge-m3 Fine-tuned)
This model is a Turkish fine-tuned version of BAAI/bge-m3, optimized for Turkish semantic similarity, retrieval, and RAG (Retrieval-Augmented Generation) tasks.
It maps Turkish sentences and paragraphs into a 1024-dimensional dense vector space.
Model Overview
| Property | Value |
|---|---|
| Base Model | BAAI/bge-m3 |
| Architecture | XLM-RoBERTa + Pooling + Normalize |
| Embedding Dimension | 1024 |
| Max Sequence Length | 8192 |
| Similarity Function | Cosine |
| Loss Functions | MultipleNegativesRankingLoss + TripletLoss |
| Language | Turkish πΉπ· |
| Use Cases | Semantic Search, Text Similarity, RAG, Clustering |
Evaluation Results
Model was evaluated on a Turkish Semantic Textual Similarity (STS) dataset.
Compared to the base multilingual BGE-M3 model, the fine-tuned model shows a notable improvement in Pearson correlation, indicating better alignment between cosine similarity scores and human judgments.
| Metric | Base (BAAI/bge-m3) | Fine-tuned | Ξ (Change) |
|---|---|---|---|
| Spearman (Ο) | 0.6814 | 0.6839 | +0.0025 |
| Pearson (r) | 0.8535 | 0.9096 | +0.0561 |
The model demonstrates higher linear correlation on Turkish STS benchmarks, producing more consistent semantic scores for Turkish-language retrieval and ranking tasks.
Quick Example
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("nezahatkorkmaz/turkce-embedding-bge-m3")
s1 = "TΓΌrkiye'nin baΕkenti Ankara'dΔ±r"
s2 = "Ankara TΓΌrkiye'nin baΕΕehridir"
emb1, emb2 = model.encode([s1, s2], normalize_embeddings=True)
score = util.cos_sim(emb1, emb2).item()
print(f"Cosine similarity: {score:.4f}")
# Expected output β 0.75β0.80
- Downloads last month
- 90
Model tree for nezahatkorkmaz/turkce-embedding-bge-m3
Base model
BAAI/bge-m3Datasets used to train nezahatkorkmaz/turkce-embedding-bge-m3
Evaluation results
- Pearson Cosine on Turkish STS Validation Setself-reported0.910
- Spearman Cosine on Turkish STS Validation Setself-reported0.684