metadata
base_model: jinaai/jina-clip-v2
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:63802
- loss:CoSENTLoss
widget:
- source_sentence: машинка детская самоходная бибикар желтый
sentences:
- 'машинка детская красная бибикар '
- моторное масло alpine dx1 5w 30 5л 0101662
- 'спинбайк schwinn ic7 '
- source_sentence: 'велосипед stels saber 20 фиолетовый '
sentences:
- 'детские спортивные комплексы '
- 'велосипед bmx stels saber 20 v010 2020 '
- 50218 кабель ugreen hd132 hdmi zinc alloy optical fiber cable черный 40m
- source_sentence: гидравличесские прессы
sentences:
- пресс гидравлический ручной механизмом
- ракетка для настольного тенниса fora 7
- 'объектив panasonic 20mm f1 7 asph ii h h020ae k '
- source_sentence: >-
бокс пластиковый монтажной платой щмп п 300х200х130 мм ip65 proxima ящики
щитки шкафы
sentences:
- >-
батарейный отсек для 4xаа открытый проволочные выводы разъем dcx2 1
battery holder 4xaa 6v dc
- 'bugera bc15 '
- >-
бокс пластиковый монтажной платой щмп п 500х350х190 мм ip65 proxima
ящики щитки шкафы
- source_sentence: 'honor watch gs pro black '
sentences:
- 'honor watch gs pro white '
- трансформер pituso carlo hb gy 06 lemon
- 'электровелосипед колхозник volten greenline 500w '
model-index:
- name: SentenceTransformer based on jinaai/jina-clip-v2
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: example dev
type: example-dev
metrics:
- type: pearson_cosine
value: 0.46018545926876964
name: Pearson Cosine
- type: spearman_cosine
value: 0.4873837299726027
name: Spearman Cosine
SentenceTransformer based on jinaai/jina-clip-v2
This is a sentence-transformers model finetuned from jinaai/jina-clip-v2. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: jinaai/jina-clip-v2
- Maximum Sequence Length: None tokens
- Output Dimensionality: None dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(transformer): Transformer(
(model): JinaCLIPModel(
(text_model): HFTextEncoder(
(transformer): XLMRobertaLoRA(
(roberta): XLMRobertaModel(
(embeddings): XLMRobertaEmbeddings(
(word_embeddings): ParametrizedEmbedding(
250002, 1024, padding_idx=1
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(token_type_embeddings): ParametrizedEmbedding(
1, 1024
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
)
(emb_drop): Dropout(p=0.1, inplace=False)
(emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder): XLMRobertaEncoder(
(layers): ModuleList(
(0-23): 24 x Block(
(mixer): MHA(
(rotary_emb): RotaryEmbedding()
(Wqkv): ParametrizedLinearResidual(
in_features=1024, out_features=3072, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(inner_attn): SelfAttention(
(drop): Dropout(p=0.1, inplace=False)
)
(inner_cross_attn): CrossAttention(
(drop): Dropout(p=0.1, inplace=False)
)
(out_proj): ParametrizedLinear(
in_features=1024, out_features=1024, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
)
(dropout1): Dropout(p=0.1, inplace=False)
(drop_path1): StochasticDepth(p=0.0, mode=row)
(norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): ParametrizedLinear(
in_features=1024, out_features=4096, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(fc2): ParametrizedLinear(
in_features=4096, out_features=1024, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
)
(dropout2): Dropout(p=0.1, inplace=False)
(drop_path2): StochasticDepth(p=0.0, mode=row)
(norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
)
)
(pooler): MeanPooler()
(proj): Identity()
)
(vision_model): EVAVisionTransformer(
(patch_embed): PatchEmbed(
(proj): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14))
)
(pos_drop): Dropout(p=0.0, inplace=False)
(rope): VisionRotaryEmbeddingFast()
(blocks): ModuleList(
(0-23): 24 x Block(
(norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(attn): Attention(
(q_proj): Linear(in_features=1024, out_features=1024, bias=False)
(k_proj): Linear(in_features=1024, out_features=1024, bias=False)
(v_proj): Linear(in_features=1024, out_features=1024, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(inner_attn_ln): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(proj): Linear(in_features=1024, out_features=1024, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(rope): VisionRotaryEmbeddingFast()
)
(drop_path): Identity()
(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(mlp): SwiGLU(
(w1): Linear(in_features=1024, out_features=2730, bias=True)
(w2): Linear(in_features=1024, out_features=2730, bias=True)
(act): SiLU()
(ffn_ln): LayerNorm((2730,), eps=1e-06, elementwise_affine=True)
(w3): Linear(in_features=2730, out_features=1024, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(head): Identity()
(patch_dropout): PatchDropout()
)
(visual_projection): Identity()
(text_projection): Identity()
)
)
(normalizer): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("seregadgl/t12")
# Run inference
sentences = [
'honor watch gs pro black ',
'honor watch gs pro white ',
'трансформер pituso carlo hb gy 06 lemon',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
example-dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.4602 |
| spearman_cosine | 0.4874 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 63,802 training samples
- Columns:
doc,candidate, andlabel - Approximate statistics based on the first 1000 samples:
doc candidate label type string string int details - min: 5 characters
- mean: 40.56 characters
- max: 115 characters
- min: 4 characters
- mean: 40.11 characters
- max: 115 characters
- 0: ~85.20%
- 1: ~14.80%
- Samples:
doc candidate label массажер xiaomi massage gun eu bhr5608euперкуссионный массажер xiaomi massage gun mini bhr6083gl0безударная дрель ingco ed50028ударная дрель ingco id2110020жидкость old smuggler 30мл 20мгжидкость old smuggler salt 30ml marlboro 20mg0 - Loss:
CoSENTLosswith these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 7,090 evaluation samples
- Columns:
doc,candidate, andlabel - Approximate statistics based on the first 1000 samples:
doc candidate label type string string int details - min: 4 characters
- mean: 40.68 characters
- max: 198 characters
- min: 5 characters
- mean: 39.92 characters
- max: 178 characters
- 0: ~84.20%
- 1: ~15.80%
- Samples:
doc candidate label круглое пляжное парео селфи коврик пляжная подстилка пляжное покрывало пляжный коврик пироженкокруглое пляжное парео селфи коврик пляжная подстилка пляжное покрывало пляжный коврик клубника0аккумулятор батарея для ноутбука asus g751аккумулятор батарея для ноутбука asus g75 series0миксер bosch mfq3520 mfq 3520миксер bosch mfq 40200 - Loss:
CoSENTLosswith these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16learning_rate: 2e-05num_train_epochs: 1lr_scheduler_type: cosinewarmup_ratio: 0.1load_best_model_at_end: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | example-dev_spearman_cosine |
|---|---|---|---|---|
| 0 | 0 | - | - | 0.0849 |
| 0.1254 | 500 | 3.7498 | 3.0315 | 0.3797 |
| 0.2508 | 1000 | 2.7653 | 2.7538 | 0.4508 |
| 0.3761 | 1500 | 2.5938 | 2.7853 | 0.4689 |
| 0.5015 | 2000 | 2.6425 | 2.6761 | 0.4800 |
| 0.6269 | 2500 | 2.6859 | 2.6341 | 0.4840 |
| 0.7523 | 3000 | 2.5805 | 2.6350 | 0.4855 |
| 0.8776 | 3500 | 2.7247 | 2.6087 | 0.4874 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.3.1
- Transformers: 4.46.3
- PyTorch: 2.4.0
- Accelerate: 0.34.2
- Datasets: 3.0.1
- Tokenizers: 0.20.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CoSENTLoss
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}