SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- csv
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("Gurveer05/all-MiniLM-eedi-2024")
# Run inference
sentences = [
'Construct: Solve coordinate geometry questions involving ratio.\n\nQuestion: A straight line on squared paper. Points P, Q and R lie on this line. The leftmost end of the line is labelled P. If you travel right 4 squares and up 1 square you get to point Q. If you then travel 8 squares right and 2 squares up from Q you reach point R. What is the ratio of P Q: P R ?\n\nOptions:\nA. 1: 12\nB. 1: 4\nC. 1: 2\nD. 1: 3\n\nCorrect Answer: 1: 3\n\nIncorrect Answer: 1: 2\n\nPredicted Misconception: Misunderstanding the ratio calculation by not considering the correct horizontal and vertical distances between points P, Q, and R.',
'May have estimated when using ratios with geometry',
'Thinks x = y is an axis',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
csv
- Dataset: csv
- Size: 12,210 training samples
- Columns:
qa_pair_text,MisconceptionName, andnegative - Approximate statistics based on the first 1000 samples:
qa_pair_text MisconceptionName negative type string string string details - min: 54 tokens
- mean: 121.45 tokens
- max: 256 tokens
- min: 4 tokens
- mean: 15.16 tokens
- max: 39 tokens
- min: 7 tokens
- mean: 14.49 tokens
- max: 40 tokens
- Samples:
qa_pair_text MisconceptionName negative Construct: Construct frequency tables.
Question: Dave has recorded the number of pets his classmates have in the frequency table on the right.
Number of petsFrequency
04
1Construct: Convert between any other time periods.
Question: To work out how many hours in a year you could do...
Options:
A. 365 x 7
B. 365 x 60
C. 365 x 12
D. 365 x 24
Correct Answer: 365 x 24
Incorrect Answer: 365 x 60
Predicted Misconception: Multiplying days by hours per minute instead of hours per day.Answers as if there are 60 hours in a dayConfuses an equation with an expressionConstruct: Given information about one part, work out other parts.
Question: Jess and Heena share some sweets in the ratio 3;: 5 .
Jess gets 15 sweets.
How many sweets does Heena get?
Options:
A. 17
B. 9
C. 5
D. 25
Correct Answer: 25
Incorrect Answer: 17
Predicted Misconception: Misunderstanding the direct proportionality between the ratio and actual quantities.Thinks a difference of one part in a ratio means the quantities will differ by one unitBelieves dividing two positives will give a negative answer - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
csv
- Dataset: csv
- Size: 9,640 evaluation samples
- Columns:
qa_pair_text,MisconceptionName, andnegative - Approximate statistics based on the first 1000 samples:
qa_pair_text MisconceptionName negative type string string string details - min: 56 tokens
- mean: 119.35 tokens
- max: 256 tokens
- min: 6 tokens
- mean: 14.51 tokens
- max: 39 tokens
- min: 6 tokens
- mean: 13.86 tokens
- max: 40 tokens
- Samples:
qa_pair_text MisconceptionName negative Construct: Identify when rounding a calculation will give an over or under approximation.
Question: Tom and Katie are discussing how to estimate the answer to
[
38.8745 / 7.9302
]
Tom says 40 / 7.9302 would give an overestimate.
Katie says 38.8745 / 8 would give an overestimate.
Who is correct?
Options:
A. Only Tom
B. Only Katie
C. Both Tom and Katie
D. Neither is correct
Correct Answer: Only Tom
Incorrect Answer: Neither is correct
Predicted Misconception: Rounding both numbers up leads to an overestimate.Believes that the larger the dividend, the smaller the answer.Does not know how to calculate the meanConstruct: Substitute negative integer values into expressions involving no powers or roots.
Question: Amy is trying to work out the distance between these two points: (1,-6) and (-5,2) She labels them like this: x_1y_1 x_2 Construct: Round numbers to three or more decimal places.
Question: What is 20.15349 rounded to 3 decimal places?
Options:
A. 20.153
B. 20.15
C. 20.154
D. 20.253
Correct Answer: 20.153
Incorrect Answer: 20.154
Predicted Misconception: Rounding up the fourth decimal place without considering the fifth decimal place.Rounds up instead of downWhen dividing decimals, does not realize that the order and position of the digits (relative to each other) has to remain constant. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32gradient_accumulation_steps: 8learning_rate: 1e-05weight_decay: 0.01num_train_epochs: 40lr_scheduler_type: cosinelr_scheduler_kwargs: {'num_cycles': 20}warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 40max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {'num_cycles': 20}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}include_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | loss |
|---|---|---|---|
| 0.5026 | 12 | 2.2789 | - |
| 1.0052 | 24 | 2.1642 | 1.9746 |
| 1.4974 | 36 | 2.0463 | - |
| 2.0 | 48 | 1.8955 | 1.6808 |
| 2.4921 | 60 | 1.7692 | - |
| 2.9948 | 72 | 1.6528 | 1.4532 |
| 3.4869 | 84 | 1.5298 | - |
| 3.9895 | 96 | 1.4338 | 1.2853 |
| 4.4817 | 108 | 1.3374 | - |
| 4.9843 | 120 | 1.3084 | 1.2465 |
| 5.4764 | 132 | 1.2921 | - |
| 5.9791 | 144 | 1.2143 | 1.1766 |
| 6.4712 | 156 | 1.1689 | - |
| 6.9738 | 168 | 1.1656 | 1.1518 |
| 7.4660 | 180 | 1.1172 | - |
| 7.9686 | 192 | 1.0737 | 1.1080 |
| 8.4607 | 204 | 1.0373 | - |
| 8.9634 | 216 | 1.0445 | 1.0874 |
| 9.4555 | 228 | 0.9707 | - |
| 9.9581 | 240 | 0.9644 | 1.0649 |
| 10.4503 | 252 | 0.9252 | - |
| 10.9529 | 264 | 0.9211 | 1.0367 |
| 11.4450 | 276 | 0.8645 | - |
| 11.9476 | 288 | 0.8635 | 1.0297 |
| 12.4398 | 300 | 0.8279 | - |
| 12.9424 | 312 | 0.819 | 1.0161 |
| 13.4346 | 324 | 0.7684 | - |
| 13.9372 | 336 | 0.7842 | 1.0016 |
| 14.4293 | 348 | 0.7448 | - |
| 14.9319 | 360 | 0.7321 | 0.9951 |
| 15.4241 | 372 | 0.7064 | - |
| 15.9267 | 384 | 0.7161 | 0.9835 |
| 16.4188 | 396 | 0.6692 | - |
| 16.9215 | 408 | 0.6594 | 0.9774 |
| 17.4136 | 420 | 0.6405 | - |
| 17.9162 | 432 | 0.638 | 0.9723 |
| 18.4084 | 444 | 0.6 | - |
| 18.9110 | 456 | 0.6122 | 0.9706 |
| 19.4031 | 468 | 0.5763 | - |
| 19.9058 | 480 | 0.5787 | 0.9732 |
| 20.3979 | 492 | 0.5432 | - |
| 20.9005 | 504 | 0.5599 | 0.9618 |
| 21.3927 | 516 | 0.5245 | - |
| 21.8953 | 528 | 0.5278 | 0.9626 |
| 22.3874 | 540 | 0.4989 | - |
| 22.8901 | 552 | 0.509 | 0.9583 |
| 23.3822 | 564 | 0.4674 | - |
| 23.8848 | 576 | 0.4854 | 0.9573 |
| 24.3770 | 588 | 0.4619 | - |
| 24.8796 | 600 | 0.4631 | 0.9615 |
| 25.3717 | 612 | 0.4339 | - |
| 25.8743 | 624 | 0.4427 | 0.9593 |
| 26.3665 | 636 | 0.4225 | - |
| 26.8691 | 648 | 0.4245 | 0.9694 |
| 27.3613 | 660 | 0.3936 | - |
| 27.8639 | 672 | 0.4168 | 0.9586 |
| 28.3560 | 684 | 0.3835 | - |
| 28.8586 | 696 | 0.3921 | 0.9629 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.1.1
- Transformers: 4.44.0
- PyTorch: 2.4.0
- Accelerate: 0.33.0
- Datasets: 2.19.2
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 5
Model tree for Gurveer05/all-MiniLM-eedi-2024
Base model
sentence-transformers/all-MiniLM-L6-v2