SentenceTransformer based on microsoft/mpnet-base

This is a sentence-transformers model finetuned from microsoft/mpnet-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: microsoft/mpnet-base
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Areeb-02/mpnet-base-GISTEmbedLoss-MSEE_Evaluator-salestax-docs")
# Run inference
sentences = [
    'Based on the context information provided, what are the different gross receipts tax rates for businesses in San Francisco for tax years 2022, 2023, and 2024?',
    '$9.75 per $1,000) for taxable gross receipts over $25,000,000\n44SANCO\n2024 NAY LO\n(D) For tax year 2024 if the Controller certifies under Section 953.10 that the\nDEPARTMENT OF\n95% gross receipts threshold has been met for tax year 2024, and for tax years beginning on or after\nJanuary 1, 2025:\n0.814% (e.g. $8.14 per $1,000) for taxable gross receipts between $0 and $1,000,000\n0.853% (e.g. $8.53 per $1,000) for taxable gross receipts between $1,000,000.01 and\n$2,500,000\n0.93% (e.g. $9.30 per $1,000) for taxable gross receipts between $2,500,000.01 and\n$25,000,000\n1.008% (e.g. $10.08 per $1,000) for taxable gross receipts over $25,000,000\n(3) For all business activities not otherwise exempt and not elsewhere\nsubjected to a gross receipts tax rate or an administrative office tax by this Article 12-A-1:\n(B) For tax years 2022 and, if the Controller does not certify under\nSection 953.10 that the 90% gross receipts threshold has been met for tax year 2023, for tax\nyear 2023:\n0.788% (e.g. $7.88 per $1,000) for taxable gross receipts between $0 and $1,000,000\n0.825% (e.g. $8.25 per $1,000) for taxable gross receipts between $1,000,000.01 and\n$2,500,000\n0.9% (e.g. $9 per $1,000) for taxable gross receipts between $2,500,000.01 and\n$25,000,000\n0.975% (e.g. $9.75 per $1,000) for taxable gross receipts over $25,000,000\n(C) For tax year 2023 if the Controller certifies under Section 953.10 that the\n90% gross receipts threshold has been met for tax year 2023,',
    '(d) In no event shall the credit under this Section 960.4 reduce a person or combined group\'s\nGross Receipts Tax liability to less than $0 for any tax year. The credit under this Section shall not be\nrefundable and may not be carried forward to a subsequent year.\nSEC. 966. CONTROLLER REPORTS.\nThe Controller shall prepare reports by September 1, 2026, and September 1, 2027,\nrespectively, that discuss current economic conditions in the City and the performance of the tax system\nrevised by the voters in the ordinance adding this Section 966.\nSection 6. Article 21 of the Business and Tax Regulations Code is hereby amended by\nrevising Section 2106 to read as follows:\nSEC. 2106. SMALL BUSINESS EXEMPTION.\n(a) For tax years ending on or before December 31, 2024, nNotwithstanding any other\nprovision of this Article 21, a person or combined group exempt from payment of the gross\nreceipts tax under Section 954.1 of Article 12-A-1, as amended from time to time, shall also\nbe exempt from payment of the Early Care and Education Commercial Rents Tax.\n79SAN\nDL W(b) For tax years beginning on or after January 1, 2025, notwithstanding any other provision\nof this Article 21, a "small business enterprise" shall be exempt from payment of the Early Care and\nEducation Commercial Rents Tax. For purposes of this subsection (b), the term "small business\nenterprise" shall mean any person or combined group whose gross receipts within the City, determined\nunder Article 12-A-1, did not exceed $2,325,000, adjusted annually in accordance with the increase in\nthe Consumer Price Index: All Urban Consumers for the San Francisco/Oakland/Hayward Area for All\nItems as reported by the United States Bureau of Labor Statistics, or any successor to that index, as of\nDecember 31 of the calendar year two years prior to the tax year, beginning with tax year 2026, and\nrounded to the nearest $10,000. This subsection (b) shall not apply to a person or combined group\nsubject to a tax on administrative office business activities in Section 953.8 of Article 12-A-1.\nSection 7.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

Dataset: stsb-dev
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-2.4282

Training Details

Training Dataset

Unnamed Dataset

Size: 238 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 5 tokens
mean: 41.95 tokens
max: 219 tokens

min: 63 tokens
mean: 426.3 tokens
max: 512 tokens

	sentence1	sentence2
type	string	string
details	min: 5 tokens mean: 41.95 tokens max: 219 tokens	min: 63 tokens mean: 426.3 tokens max: 512 tokens

Samples:

sentence1	sentence2
`What types of businesses are subject to the gross receipts tax in San Francisco, and how is their San Francisco gross receipts calculated? What are the current rates for this tax, and are there any exemptions or scheduled increases?`	The Way It Is Now CHANGES TO BUSINESS TAXES The City collects various business taxes on an annual basis including: O • SAN FRANCISCO FILED 2024 MAY 15 PM 3:10 DEPARTMENT OF ELECTIONS A gross receipts tax that is a percentage of a business's San Francisco gross receipts. Depending on business type, the City determines a business's San Francisco gross receipts based on sales in San Francisco, payroll expenses for employees working there, or both. Rates range from 0.053% to 1.008% and are scheduled to increase in coming years. Rates depend on business type, and higher rates apply as a business generates more gross receipts. For 2023, most businesses with gross receipts up to $2.19 million are exempt. A homelessness gross receipts tax that is an additional tax on businesses with San Francisco gross receipts over $50 million. Rates range from 0.175% to 0.69%. An overpaid executive gross receipts tax that is an additional tax on businesses that pay their highest-paid managerial employee much higher than the median compensation they pay their San Francisco employees. Rates are between 0.1% and 0.6%. A business registration fee that is an additional tax. For most businesses the fee is currently between $47 and $45,150, based on business type and amount of gross receipts. • An administrative office tax on payroll expenses that certain large businesses pay instead of these other business taxes. The combined rates in 2024 range from 3.04% to 5.44%, and in 2025 are scheduled to range from 3.11% to 5.51%. Business registration fees for these businesses currently range from $19,682 to $45,928. State law limits the total revenue, including tax revenue, the City may spend each year. The voters may approve increases to this limit for up to four years.
`What is the homelessness gross receipts tax, and which businesses are required to pay it? What are the current rates for this tax, and how do they vary based on the amount of San Francisco gross receipts? Are there any exemptions or scheduled increases for this tax?`	The Way It Is Now CHANGES TO BUSINESS TAXES The City collects various business taxes on an annual basis including: O • SAN FRANCISCO FILED 2024 MAY 15 PM 3:10 DEPARTMENT OF ELECTIONS A gross receipts tax that is a percentage of a business's San Francisco gross receipts. Depending on business type, the City determines a business's San Francisco gross receipts based on sales in San Francisco, payroll expenses for employees working there, or both. Rates range from 0.053% to 1.008% and are scheduled to increase in coming years. Rates depend on business type, and higher rates apply as a business generates more gross receipts. For 2023, most businesses with gross receipts up to $2.19 million are exempt. A homelessness gross receipts tax that is an additional tax on businesses with San Francisco gross receipts over $50 million. Rates range from 0.175% to 0.69%. An overpaid executive gross receipts tax that is an additional tax on businesses that pay their highest-paid managerial employee much higher than the median compensation they pay their San Francisco employees. Rates are between 0.1% and 0.6%. A business registration fee that is an additional tax. For most businesses the fee is currently between $47 and $45,150, based on business type and amount of gross receipts. • An administrative office tax on payroll expenses that certain large businesses pay instead of these other business taxes. The combined rates in 2024 range from 3.04% to 5.44%, and in 2025 are scheduled to range from 3.11% to 5.51%. Business registration fees for these businesses currently range from $19,682 to $45,928. State law limits the total revenue, including tax revenue, the City may spend each year. The voters may approve increases to this limit for up to four years.
`What is the proposed measure that voters may approve to change the City's business taxes in San Francisco?`	The voters may approve increases to this limit for up to four years. The Proposal The proposed measure would change the City's business taxes to: • For the gross receipts tax: ○ recategorize business types, reducing the number from 14 to seven; determine San Francisco gross receipts for some businesses based less on payroll expenses and more on sales; o change rates to between 0.1% and 3.716%; and exempt most businesses with gross receipts up to $5 million (increased by inflation). Apply the homelessness gross receipts tax on business activities with San Francisco gross receipts over $25 million, at rates between 0.162% and 1.64%. Modify how the City calculates the overpaid executive gross receipts tax and who pays that tax, and set rates between 0.02% and 0.129%. Adjust business registration fees to between $55 and $60,000 (increased by inflation).Adjust the administrative office tax rates for certain large businesses to range from 2.97% to 3.694%, and the business registration fees for these taxpayers to between $500 and $35,000 (increased by inflation). Make administrative and other changes to the City's business taxes. The homelessness gross receipts tax would continue to fund services for people experiencing homelessness and homelessness prevention. The City would use the other taxes for general government purposes. All these taxes would apply indefinitely until repealed. This proposal would increase the City's spending limit for four years.SALITA CO 2024 MAY 10 PH 1:27 DEPARTMENT OF ELECTI "Local Small Business Tax Cut Ordinance" Be it ordained by the People of the City and County of San Francisco: NOTE: Unchanged Code text and uncodified text are in plain font. Additions to Codes are in single-underline italics Times New Roman font. Deletions to Codes are in strikethrough italics Times New Roman font. Asterisks (* * * *) indicate the omission of unchanged Code subsections or parts of tables. Section 1. Title. This initiative is known and may be referred to as the "Local Small Business Tax Cut Ordinance." Section 2. Article 2 of the Business and Tax Regulations Code is hereby amended by revising Section 76.3 to read as follows: SEC. 76.3.

Loss: GISTEmbedLoss with these parameters:

{'guide': SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), 'temperature': 0.01}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 1
warmup_ratio: 0.1

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	stsb-dev_negative_mse
0	0	-2.4282

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.41.2
PyTorch: 2.3.0+cu121
Accelerate: 0.31.0
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, 
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Areeb-02/mpnet-base-GISTEmbedLoss-MSEE_Evaluator-salestax-docs

Base model

microsoft/mpnet-base

Finetuned

(113)

this model

Papers for Areeb-02/mpnet-base-GISTEmbedLoss-MSEE_Evaluator-salestax-docs

GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning

Paper • 2402.16829 • Published Feb 26, 2024

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 9

Evaluation results

Negative Mse on stsb dev
self-reported

-2.428