Sentence Similarity
sentence-transformers
Safetensors
Turkish
xlm-roberta
feature-extraction
dense
turkish
semantic-search
Eval Results
text-embeddings-inference

πŸ‡ΉπŸ‡· Turkish Embedding Model (bge-m3 Fine-tuned)

This model is a Turkish fine-tuned version of BAAI/bge-m3, optimized for Turkish semantic similarity, retrieval, and RAG (Retrieval-Augmented Generation) tasks.
It maps Turkish sentences and paragraphs into a 1024-dimensional dense vector space.


Model Overview

Property Value
Base Model BAAI/bge-m3
Architecture XLM-RoBERTa + Pooling + Normalize
Embedding Dimension 1024
Max Sequence Length 8192
Similarity Function Cosine
Loss Functions MultipleNegativesRankingLoss + TripletLoss
Language Turkish πŸ‡ΉπŸ‡·
Use Cases Semantic Search, Text Similarity, RAG, Clustering

Evaluation Results

Model was evaluated on a Turkish Semantic Textual Similarity (STS) dataset.
Compared to the base multilingual BGE-M3 model, the fine-tuned model shows a notable improvement in Pearson correlation, indicating better alignment between cosine similarity scores and human judgments.

Metric Base (BAAI/bge-m3) Fine-tuned Ξ” (Change)
Spearman (ρ) 0.6814 0.6839 +0.0025
Pearson (r) 0.8535 0.9096 +0.0561

The model demonstrates higher linear correlation on Turkish STS benchmarks, producing more consistent semantic scores for Turkish-language retrieval and ranking tasks.


Quick Example

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("nezahatkorkmaz/turkce-embedding-bge-m3")

s1 = "Türkiye'nin başkenti Ankara'dır"
s2 = "Ankara Türkiye'nin başşehridir"

emb1, emb2 = model.encode([s1, s2], normalize_embeddings=True)
score = util.cos_sim(emb1, emb2).item()
print(f"Cosine similarity: {score:.4f}")
# Expected output β‰ˆ 0.75–0.80
Downloads last month
90
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nezahatkorkmaz/turkce-embedding-bge-m3

Base model

BAAI/bge-m3
Finetuned
(349)
this model

Datasets used to train nezahatkorkmaz/turkce-embedding-bge-m3

Evaluation results