SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • parquet

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Metabolite Identification Using Artificial Neural Network Metabolite identification is a major bottleneck in mass spectrometrybased metabolomic studies. In a typical untargeted metabolomic study that uses tandem MS (MSMS), the identities of metabolites are determined by matching the experimental MSMS data against those in spectral libraries. However, existing spectral libraries cover only a small fraction of known compounds. We introduce a neural network model to predict the identity of analytes whose reference measurements are not available in spectral libraries. Specifically, an artificial neural network model is trained to predict the fingerprints of compounds based on their experimental MSMS spectra. Candidates are retrieved from metabolite databases based on molecular formula or precursor mass. The candidate with the most similar fingerprint is chosen as the predicted one. The method was evaluated via an independent dataset. The results show that our neural network model improves MSMSbased metabolite identification accuracy by a considerable margin compared to previous computational methods, such as Metfrag and CSI:FingerID.',
    'Efficient Shared Peak Counting in Database Peptide Search Using Compact Data Structure for FragmentIon Index Database search is the most commonly employed method for identification of peptides from MSMS spectra data. The search involves comparing experimentally obtained MSMS spectra against a set of theoretical spectra predicted from a protein sequence database. One of the most commonly employed similarity metrics for spectral comparison is the sharedpeak count between a pair of MSMS spectra. Most modern methods index all generated fragmention data from theoretical spectra to speed up the shared peak count computations between a given experimental spectrum and all theoretical spectra. However, the bottleneck for this method is the gigantic memory footprint of fragmention index that leads to nonscalable solutions. In this paper, we present a novel data structure, called Compact FragmentIon Index Representation (CFIR), that efficiently compresses highly redundant ionmass information in the data to reduce the index size. Our proposed data structure outperforms all existing fragmention indexing data structures by at least 2xc3x97 in memory consumption while exhibiting the same time complexity for index construction and peptide search. The results also show comparable indexing speed, search speed and speedup scalability for CFIRindex and the stateoftheart algorithms.',
    'Rickshaw Buddy RICKSHAW BUDDY is a lowcost automated assistance system for threewheeler auto rickshaws to reduce the high rate of accidents in the streets of developing countries like Bangladesh. It is a given fact that the lack of over speed alert, back camera, detection of rear obstacle and delay of maintenance are causes behind fatal accidents. These systems are absent not only in auto rickshaws but also most public transports. For this system, surveys have been done in different phases among the passengers, drivers and even the conductors for a useful and successful result. Since the system is very cheap, the lowincome drivers and owners of vehicles will be able to afford it easily making road safety the first and foremost priority.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6391, 0.0971],
#         [0.6391, 1.0000, 0.0272],
#         [0.0971, 0.0272, 1.0000]])

Training Details

Training Dataset

parquet

  • Dataset: parquet
  • Size: 38,864 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 136 tokens
    • mean: 248.76 tokens
    • max: 384 tokens
    • min: 126 tokens
    • mean: 256.93 tokens
    • max: 384 tokens
    • min: 126 tokens
    • mean: 237.9 tokens
    • max: 384 tokens
  • Samples:
    anchor positive negative
    The longterm effect of media violence exposure on aggression of youngsters Abstract The effect of media violence on aggression has always been a trending issue, and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger, this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters, as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure (HMVE) exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure (LMVE). After being provoked, the anger of all participants was significantly increased, and the anger and p... Link between Facial Expressions and Emotional States Induced by Exposure to Multimedia Content The explosive growth of digital videos has created new challenges for computer science. While many advances on video indexing, retrieval and summarization based on general, subjectindependent, objective descriptors have been made in the past years, research on the use of individual subjective preferences and affective states is at the forefront of research and poses great challenges. In this article, we study the relationship between emotional states reported by viewers and their facial physiological changes observed during the display of different video genres. A dataset of twenty videos was created from YouTube video sharing platform. During the exhibition of the videos, the viewerxe2x80x99s facial activities have been recorded and analyzed by means of Action Units (AUs). After that, emotional states selfreported by the viewers were assigned to video shots. Labels were divided into four cat... Unmanned agricultural product sales system The invention relates to the field of agricultural product sales, provides an unmanned agricultural product sales system, and aims to solve the problem of agricultural product waste caused by the factthat most farmers can only prepare goods according to guessing and experiences when selling agricultural products at present. The unmanned agricultural product sales system comprises an acquisition module for acquiring selection information of customers; a storage module which prestores a vegetable preparation scheme; a matching module which is used for matching a corresponding side dish schemefrom the storage module according to the selection information of the client; a pushing module which is used for pushing the matched side dish scheme back to the client; an acquisition module which isalso used for acquiring confirmation information of a client; an order module which is used for generating order information according to the confirmation infor...
    The longterm effect of media violence exposure on aggression of youngsters Abstract The effect of media violence on aggression has always been a trending issue, and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger, this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters, as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure (HMVE) exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure (LMVE). After being provoked, the anger of all participants was significantly increased, and the anger and p... Link between Facial Expressions and Emotional States Induced by Exposure to Multimedia Content The explosive growth of digital videos has created new challenges for computer science. While many advances on video indexing, retrieval and summarization based on general, subjectindependent, objective descriptors have been made in the past years, research on the use of individual subjective preferences and affective states is at the forefront of research and poses great challenges. In this article, we study the relationship between emotional states reported by viewers and their facial physiological changes observed during the display of different video genres. A dataset of twenty videos was created from YouTube video sharing platform. During the exhibition of the videos, the viewerxe2x80x99s facial activities have been recorded and analyzed by means of Action Units (AUs). After that, emotional states selfreported by the viewers were assigned to video shots. Labels were divided into four cat... Rickshaw Buddy RICKSHAW BUDDY is a lowcost automated assistance system for threewheeler auto rickshaws to reduce the high rate of accidents in the streets of developing countries like Bangladesh. It is a given fact that the lack of over speed alert, back camera, detection of rear obstacle and delay of maintenance are causes behind fatal accidents. These systems are absent not only in auto rickshaws but also most public transports. For this system, surveys have been done in different phases among the passengers, drivers and even the conductors for a useful and successful result. Since the system is very cheap, the lowincome drivers and owners of vehicles will be able to afford it easily making road safety the first and foremost priority.
    The longterm effect of media violence exposure on aggression of youngsters Abstract The effect of media violence on aggression has always been a trending issue, and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger, this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters, as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure (HMVE) exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure (LMVE). After being provoked, the anger of all participants was significantly increased, and the anger and p... Link between Facial Expressions and Emotional States Induced by Exposure to Multimedia Content The explosive growth of digital videos has created new challenges for computer science. While many advances on video indexing, retrieval and summarization based on general, subjectindependent, objective descriptors have been made in the past years, research on the use of individual subjective preferences and affective states is at the forefront of research and poses great challenges. In this article, we study the relationship between emotional states reported by viewers and their facial physiological changes observed during the display of different video genres. A dataset of twenty videos was created from YouTube video sharing platform. During the exhibition of the videos, the viewerxe2x80x99s facial activities have been recorded and analyzed by means of Action Units (AUs). After that, emotional states selfreported by the viewers were assigned to video shots. Labels were divided into four cat... Minimum number of additive tuples in groups of prime order For a prime number p and a sequence of integers a0, . . . , akxe2x88x88 0,1, . . . , p, lets (a0, . . . , ak) be the minimum number of (k 1)tuples (x0, . . . , xk) xe2x88x88A0xc3x97xc2xb7xc2xb7xc2xb7xc3x97Akwithx0x1xc2xb7xc2xb7xc2xb7xk, over subsets a0, . . . , Akxe2x8ax86Zp of sizes a0, . . . , ak respectively. We observe that an elegant argument of Samotij and Sudakov can be extended to show that there exists an extremal configuration with all sets Ai being intervals of appropriate length. The same conclusion also holds for the related problem ,posed by Bajnok, whena0xc2xb7xc2xb7xc2xb7ak:aandA0xc2xb7xc2xb7xc2xb7Ak, provided k is not equal 1 modulop. Finally, by applying basic Fourier analysis, we show for Bajnokxe2x80x99s problem that if pu003e13 and axe2x88x88 3, . . . , pxe2x88x923are fixed whilekxe2x89xa11 (modp) tends to infinity, then the extremal configuration alternates between at least two affine nonequivalent sets.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {
        "guide": "SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')",
        "temperature": 0.01,
        "mini_batch_size": 32,
        "margin_strategy": "relative",
        "margin": 0.1,
        "contrast_anchors": true,
        "contrast_positives": true,
        "gather_across_devices": false
    }
    

Evaluation Dataset

parquet

  • Dataset: parquet
  • Size: 4,858 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 140 tokens
    • mean: 244.64 tokens
    • max: 384 tokens
    • min: 133 tokens
    • mean: 249.07 tokens
    • max: 384 tokens
    • min: 127 tokens
    • mean: 235.29 tokens
    • max: 384 tokens
  • Samples:
    anchor positive negative
    Secrecy Energy Efficient Beamforming for SatelliteTerrestrial Coordinated Communication Systems This paper investigates the secrecy energy efficiency maximization (SEEM) problem in a satelliteterrestrial coordinated communication systems. First, we formulate a coordinated transmission optimization problem by using the secrecy energy efficiency (SEE) as optimization criterion and meanwhile the transmission quality of the terrestrial link and the transmission power of the satellite and ground base station as constraints. Due to the fractional form of SEE, the formulated optimization problem is nonconvex and mathematically intractable. Then, we transform the original fractional problem into an equivalent subtractive problem, and employ the difference of twoconvex functions (D.C.) approximation method to arrive at an approximate convex problem. Finally, we propose a convergence proven iterative algorithm to solve the modified maximization problem. Simulation results are presented to illust... On the Ergodic Secrecy Rate of Massive MIMO Transmission with Partial Legitimate User CSI In this paper, we consider the downlink transmission over a singlecell massive multipleinputmultipleoutput (MIMO) system in the presence of multiple singleantenna eavesdroppers (massive MIMOME). We concentrate on the practical scenario where partial channel state information (CSI) of legitimate users and no CSI of eavesdroppers are available at the base station and consider both types of eavesdroppers including the noncolluding and colluding eavesdroppers. Random unitary beamforming (RUB) based scheme is used to describe the partial CSI of legitimate users. We derive the closedform expression of ergodic secrecy rate for RUB based massive MIMOME transmission, and its single legitimate user particular case. We also present numerical results to illustrate the performancecomplexity tradeoff among different massive MIMO transmission schemes. We show that RUB based scheme can enhance secrecy performance... Rickshaw Buddy RICKSHAW BUDDY is a lowcost automated assistance system for threewheeler auto rickshaws to reduce the high rate of accidents in the streets of developing countries like Bangladesh. It is a given fact that the lack of over speed alert, back camera, detection of rear obstacle and delay of maintenance are causes behind fatal accidents. These systems are absent not only in auto rickshaws but also most public transports. For this system, surveys have been done in different phases among the passengers, drivers and even the conductors for a useful and successful result. Since the system is very cheap, the lowincome drivers and owners of vehicles will be able to afford it easily making road safety the first and foremost priority.
    Secrecy Energy Efficient Beamforming for SatelliteTerrestrial Coordinated Communication Systems This paper investigates the secrecy energy efficiency maximization (SEEM) problem in a satelliteterrestrial coordinated communication systems. First, we formulate a coordinated transmission optimization problem by using the secrecy energy efficiency (SEE) as optimization criterion and meanwhile the transmission quality of the terrestrial link and the transmission power of the satellite and ground base station as constraints. Due to the fractional form of SEE, the formulated optimization problem is nonconvex and mathematically intractable. Then, we transform the original fractional problem into an equivalent subtractive problem, and employ the difference of twoconvex functions (D.C.) approximation method to arrive at an approximate convex problem. Finally, we propose a convergence proven iterative algorithm to solve the modified maximization problem. Simulation results are presented to illust... On the Ergodic Secrecy Rate of Massive MIMO Transmission with Partial Legitimate User CSI In this paper, we consider the downlink transmission over a singlecell massive multipleinputmultipleoutput (MIMO) system in the presence of multiple singleantenna eavesdroppers (massive MIMOME). We concentrate on the practical scenario where partial channel state information (CSI) of legitimate users and no CSI of eavesdroppers are available at the base station and consider both types of eavesdroppers including the noncolluding and colluding eavesdroppers. Random unitary beamforming (RUB) based scheme is used to describe the partial CSI of legitimate users. We derive the closedform expression of ergodic secrecy rate for RUB based massive MIMOME transmission, and its single legitimate user particular case. We also present numerical results to illustrate the performancecomplexity tradeoff among different massive MIMO transmission schemes. We show that RUB based scheme can enhance secrecy performance... Does a friendly robot make you feel better As robots are taking a more prominent role in our daily lives, it becomes increasingly important to consider how their presence influences us. Several studies have investigated effects of robot behavior on the extent to which that robot is positively evaluated. Likewise, studies have shown that the emotions a robot shows tend to be contagious: a happy robot makes us feel happy as well. It is unknown, however, whether the affect that people experience while interacting with a robot also influences their evaluation of the robot. This study aims to discover whether peoplexe2x80x99s affective and evaluative responses to a social robot are related. Results show that affective responses and evaluations are related, and that these effects are strongest when a robot shows meaningful motions. These results are consistent with earlier findings in terms of how people evaluate social robots.
    Secrecy Energy Efficient Beamforming for SatelliteTerrestrial Coordinated Communication Systems This paper investigates the secrecy energy efficiency maximization (SEEM) problem in a satelliteterrestrial coordinated communication systems. First, we formulate a coordinated transmission optimization problem by using the secrecy energy efficiency (SEE) as optimization criterion and meanwhile the transmission quality of the terrestrial link and the transmission power of the satellite and ground base station as constraints. Due to the fractional form of SEE, the formulated optimization problem is nonconvex and mathematically intractable. Then, we transform the original fractional problem into an equivalent subtractive problem, and employ the difference of twoconvex functions (D.C.) approximation method to arrive at an approximate convex problem. Finally, we propose a convergence proven iterative algorithm to solve the modified maximization problem. Simulation results are presented to illust... On the Ergodic Secrecy Rate of Massive MIMO Transmission with Partial Legitimate User CSI In this paper, we consider the downlink transmission over a singlecell massive multipleinputmultipleoutput (MIMO) system in the presence of multiple singleantenna eavesdroppers (massive MIMOME). We concentrate on the practical scenario where partial channel state information (CSI) of legitimate users and no CSI of eavesdroppers are available at the base station and consider both types of eavesdroppers including the noncolluding and colluding eavesdroppers. Random unitary beamforming (RUB) based scheme is used to describe the partial CSI of legitimate users. We derive the closedform expression of ergodic secrecy rate for RUB based massive MIMOME transmission, and its single legitimate user particular case. We also present numerical results to illustrate the performancecomplexity tradeoff among different massive MIMO transmission schemes. We show that RUB based scheme can enhance secrecy performance... A Critical Look at the 2019 College Admissions Scandal Discusses the 2019 College admissions scandal. Let me begin with a disclaimer: I am making no legal excuses for the participants in the current scandal. I am only offering contextual background that places it in the broader academic, cultural, and political perspective required for understanding. It is only the most recent installment of a wellworn narrative: the controlling elite make their own rules and live by them, if they can get away with it. Unfortunately, some of the participants, who are either serving or facing jail time, didnxe2x80x99t know to not go into a gunfight with a sharp stick. Money alone is not enough to avoid prosecution for fraud: you need political clout. The best protection a defendant can have is a prosecutor who fears political reprisal. Compare how the Koch brothers escaped prosecution for stealing millions of oil dollars from Native American tribes1,2 with the fate of actresses Lori Loughlin and Felicit...
  • Loss: CachedGISTEmbedLoss with these parameters:
    {
        "guide": "SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')",
        "temperature": 0.01,
        "mini_batch_size": 32,
        "margin_strategy": "relative",
        "margin": 0.1,
        "contrast_anchors": true,
        "contrast_positives": true,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • optim: adamw_torch
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
0.0412 100 0.049 0.0217
0.0823 200 0.073 0.0756
0.1235 300 0.1089 0.1086
0.1647 400 0.1255 0.0670
0.2058 500 0.0805 0.0876
0.2470 600 0.103 0.0718
0.2882 700 0.0559 0.0891
0.3294 800 0.1071 0.0660
0.3705 900 0.066 0.0728
0.4117 1000 0.0631 0.0631
0.4529 1100 0.0504 0.0484
0.4940 1200 0.0568 0.0482
0.5352 1300 0.042 0.0680
0.5764 1400 0.0416 0.0695
0.6175 1500 0.0221 0.0731
0.6587 1600 0.0426 0.0587
0.6999 1700 0.0291 0.0392
0.7410 1800 0.0285 0.0410
0.7822 1900 0.0204 0.0433
0.8234 2000 0.0269 0.0390
0.8646 2100 0.0224 0.0385
0.9057 2200 0.012 0.0391
0.9469 2300 0.0006 0.0392
0.9881 2400 0.0012 0.0396

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
177
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vaios-stergio/all-mpnet-base-v2-dblp-aminer-50k-triplets-cachedgistembedloss

Finetuned
(323)
this model