SentenceTransformer based on bowphs/SPhilBerta

This is a sentence-transformers model finetuned from bowphs/SPhilBerta. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: bowphs/SPhilBerta
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("julian-schelb/SPhilBerta-latin-intertextuality-v1")
# Run inference
sentences = [
    'Query: Quia ergo insanivit Israel, et percussus fornicationis spiritu, incredibili furore bacchatus est, ideo non multo post tempore, sed dum propheto, dum spiritus hos regit artus, pascet eos Dominus quasi agnum in latitudine.',
    'Candidate: Te solum in bella secutus,  Post te fata sequar: neque enim sperare secunda  Fas mihi, nec liceat.',
    'Candidate:  ut tuus amicus, Crasse, Granius non esse sextantis.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9598
cosine_accuracy_threshold 0.6652
cosine_f1 0.7513
cosine_f1_threshold 0.6329
cosine_precision 0.8353
cosine_recall 0.6827
cosine_ap 0.8119
cosine_mcc 0.7336

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,895 training samples
  • Columns: query, match, and label
  • Approximate statistics based on the first 1000 samples:
    query match label
    type string string int
    details
    • min: 6 tokens
    • mean: 41.53 tokens
    • max: 128 tokens
    • min: 6 tokens
    • mean: 32.4 tokens
    • max: 128 tokens
    • 0: ~91.70%
    • 1: ~8.30%
  • Samples:
    query match label
    Query: quod et illustris poeta testatur dicens: sed fugit interea, fugit irreparabile tempus et iterum: Rhaebe, diu, res si qua diu mortalibus ulla est, uiximus. Candidate: omnino si ego evolo mense Quintili in Graeciam, sunt omnia faciliora; sed cum sint ea tempora ut certi nihil esse possit quid honestum mihi sit, quid liceat, quid expediat, quaeso, da operam ut illum quam honestissime copiosissimeque tueamur. 0
    Query: Non solum in Ecclesia morantur oves, nec mundae tantum aves volitant; sed frumentum in agro seritur, interque nitentia culta Lappaeque et tribuli, et steriles dominantur avenae. Candidate: atque hoc in loco, si facultas erit, exemplis uti oportebit, quibus in simili excusatione non sit ignotum, et contentione, magis illis ignoscendum fuisse, et deliberationis partibus, turpe aut inutile esse concedi eam rem, quae ab adversario commissa sit: permagnum esse et magno futurum detrimento, si ea res ab iis, qui potestatem habent vindicandi, neglecta sit. 0
    Query: adiuratus enim per eundem patrem et spes surgentis Iuli, nequaquam pepercit tums accensus et ira. Candidate: factus olor niveis pendebat in aere pennis. 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 1,144 evaluation samples
  • Columns: query, match, and label
  • Approximate statistics based on the first 1000 samples:
    query match label
    type string string int
    details
    • min: 8 tokens
    • mean: 39.04 tokens
    • max: 121 tokens
    • min: 6 tokens
    • mean: 32.47 tokens
    • max: 128 tokens
    • 0: ~91.10%
    • 1: ~8.90%
  • Samples:
    query match label
    Query: qui uero pauperes sunt et tenui substantiola uidenturque sibi scioli, pomparum ferculis similes procedunt ad publicum, ut caninam exerceant facundiam. Candidate: cogitat reliquas colonias obire. 0
    Query: nec uarios discet mentiri lana colores, ipse sed in pratis aries iam suaue rubenti- murice, iam croceo mutabit uellera luto, sponte sua sandyx pascentis uestiet agnos. Candidate: loquitur ad voluntatem; quicquid denunciatum est, facit, assectatur, assidet, muneratur. 0
    Query: credite experto, quasi Christianus Christianis loquor: uenenata sunt illius dogmata, aliena a scripturis sanctis, uim scripturis facientia. Candidate: ignoscunt mihi, revocant in consuetudinem pristinam te que, quod in ea permanseris, sapientiorem quam me dicunt fuisse. 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • overwrite_output_dir: True
  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 4
  • warmup_steps: 1958
  • prompts: {'query': 'Query: ', 'match': 'Candidate: '}

All Hyperparameters

Click to expand
  • overwrite_output_dir: True
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 1958
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: {'query': 'Query: ', 'match': 'Candidate: '}
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss latin_intertext_cosine_ap
0.6494 50 0.6022 0.1430 0.7392
1.2987 100 0.5519 0.1191 0.7579
1.9481 150 0.4728 0.1021 0.7794
2.5974 200 0.4001 0.0934 0.7917
3.2468 250 0.2689 0.0917 0.8048
3.8961 300 0.221 0.0834 0.8119

Framework Versions

  • Python: 3.10.8
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.4.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for julian-schelb/SPhilBerta-latin-intertextuality-v1

Base model

bowphs/SPhilBerta
Finetuned
(4)
this model

Evaluation results