SentenceTransformer based on sentence-transformers/multi-qa-MiniLM-L6-cos-v1

This is a sentence-transformers model finetuned from sentence-transformers/multi-qa-MiniLM-L6-cos-v1. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("LamaDiab/MultiMiniLM-V25Data-256BATCH-SemanticEngine")
# Run inference
sentences = [
    'golden olive pouch',
    'pouch',
    'trio kaftan',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7852, 0.1012],
#         [0.7852, 1.0000, 0.0358],
#         [0.1012, 0.0358, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9699

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,006,385 training samples
  • Columns: anchor, positive, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive itemCategory
    type string string string
    details
    • min: 3 tokens
    • mean: 11.94 tokens
    • max: 72 tokens
    • min: 3 tokens
    • mean: 4.57 tokens
    • max: 83 tokens
    • min: 3 tokens
    • mean: 3.91 tokens
    • max: 9 tokens
  • Samples:
    anchor positive itemCategory
    lice repellent serum hair serum hair serum
    vanilla sponge cake with fresh moisturizer and strawberry pieces.
    serve person.
    vanilla tres leches sweet
    wyl chips - kettle cooked sea salt & balsamic vinegar potato chips - gr snacks snacks
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 9,509 evaluation samples
  • Columns: anchor, positive, negative, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative itemCategory
    type string string string string
    details
    • min: 3 tokens
    • mean: 9.63 tokens
    • max: 43 tokens
    • min: 3 tokens
    • mean: 6.3 tokens
    • max: 150 tokens
    • min: 3 tokens
    • mean: 9.5 tokens
    • max: 35 tokens
    • min: 3 tokens
    • mean: 3.88 tokens
    • max: 10 tokens
  • Samples:
    anchor positive negative itemCategory
    pilot mechanical pencil progrex h-127 - 0.7 mm pilot pencil lunch bag colors 22 ร— 16 ร— 28 cm must shark 000586181 pencil
    superior drawing marker -pen - set of 12 colors - 2 nib marker pen staedtler triplus fineliner 10 + 3 pack marker
    first person singular author: haruki murakami first person singular book misty grater stainless steel literature and fiction
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • dataloader_persistent_workers: True
  • push_to_hub: True
  • hub_model_id: LamaDiab/MultiMiniLM-V25Data-256BATCH-SemanticEngine
  • hub_strategy: all_checkpoints

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: True
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: LamaDiab/MultiMiniLM-V25Data-256BATCH-SemanticEngine
  • hub_strategy: all_checkpoints
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss cosine_accuracy
0.0003 1 3.7722 - -
0.2543 1000 2.8712 0.5578 0.9445
0.5086 2000 1.8052 0.5062 0.9498
0.7630 3000 1.2908 0.4583 0.9575
1.0173 4000 1.0201 0.4365 0.9621
1.2715 5000 1.2224 0.4258 0.9627
1.5257 6000 1.1341 0.4155 0.9645
1.7799 7000 1.0798 0.4215 0.9657
2.0341 8000 1.0249 0.4113 0.9681
2.2883 9000 0.9443 0.4053 0.9673
2.5425 10000 0.9178 0.3976 0.9688
2.7966 11000 0.8991 0.4033 0.9703
3.0508 12000 0.8777 0.4056 0.9698
3.3050 13000 0.8326 0.4063 0.9693
3.5592 14000 0.8178 0.4036 0.9693
3.8134 15000 0.8067 0.3971 0.9695
4.0676 16000 0.8047 0.3984 0.9702
4.3218 17000 0.762 0.3980 0.9697
4.5760 18000 0.772 0.3997 0.9695
4.8302 19000 0.7655 0.3972 0.9699

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.1.2
  • Transformers: 4.53.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
55
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LamaDiab/MultiMiniLM-V25Data-256BATCH-SemanticEngine

Finetuned
(24)
this model

Evaluation results