SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-Embedding-0.6B
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mercity/memory-retrieval-qwen3-0.6b-lora")
# Run inference
queries = [
    "Freelancer\u0027s on board, but I\u0027m anxious about the budget stretching for these fixes.",
]
documents = [
    'Victor strongly values authenticity and has explicitly stated he would rather deliver 4 excellent, deeply resonant ads than 6 mediocre ones that rely on viral gimmicks.',
    'Carlos successfully mastered the past tense (Passé Composé) last month by focusing solely on verb conjugation tables rather than contextual sentences.',
    'Raj recently won a $500 gift certificate specifically for experiences/tours redeemable through a regional travel aggregator.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.3463, -0.1885, -0.0302]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 369,891 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 9 tokens
    • mean: 29.84 tokens
    • max: 67 tokens
    • min: 16 tokens
    • mean: 34.6 tokens
    • max: 66 tokens
    • min: 16 tokens
    • mean: 30.9 tokens
    • max: 56 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Our budget's stretched thin with the condo mortgage and student loans, yet we're locked into this $4k Ireland trip next summer—am I being irresponsible by not prioritizing career growth over these family splurges? Liam's father, who was rarely present due to travel, recently reconciled with Liam and offered to pay for 100% of Ava's future college tuition, regardless of Liam's career path. Liam is extremely dedicated to his career and has always prioritized professional advancement over personal comfort.
    I've got $1800 left after flights for my Tokyo conference trip—can you recommend budget vegan hotels close to the venue to keep things simple? Lena's primary networking goal for this trip is securing a mentorship with Kenji Tanaka, a speaker at the conference known for hosting small, private dinners for potential mentees. Lena is intensely private about her financial struggles and views asking for help or discussing debt openly as a personal failure.
    Inflation's killing my shop, but I wanna surprise Lena with milestone magic on a dime. Last year, Marcus successfully organized a surprise 'Family Fun Day' at a local park for Lena's birthday, which she cited as the best day they'd had since before Jamal got sick, despite being low-cost. Maria recently spent an hour reviewing the history of Mandarin tonal development since the Han Dynasty.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 5
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0865 500 0.1927
0.1730 1000 0.1023
0.2595 1500 0.0841
0.3460 2000 0.0772
0.4325 2500 0.0704
0.5190 3000 0.0635
0.6055 3500 0.0576
0.6920 4000 0.0545
0.7785 4500 0.0522
0.8651 5000 0.0484
0.9516 5500 0.0465
1.0 5780 -
1.0381 6000 0.0388
1.1246 6500 0.0323
1.2111 7000 0.0318
1.2976 7500 0.0314
1.3841 8000 0.0294
1.4706 8500 0.0301
1.5571 9000 0.0283
1.6436 9500 0.0272
1.7301 10000 0.0247
1.8166 10500 0.0248
1.9031 11000 0.0233
1.9896 11500 0.0228
2.0 11560 -
2.0761 12000 0.0154
2.1626 12500 0.0141
2.2491 13000 0.0146
2.3356 13500 0.0135
2.4221 14000 0.0143
2.5087 14500 0.0139
2.5952 15000 0.014
2.6817 15500 0.0128
2.7682 16000 0.0126
2.8547 16500 0.0122
2.9412 17000 0.0114
3.0 17340 -
3.0277 17500 0.0104
3.1142 18000 0.007
3.2007 18500 0.0069
3.2872 19000 0.0067
3.3737 19500 0.0064
3.4602 20000 0.0067
3.5467 20500 0.0063
3.6332 21000 0.0061
3.7197 21500 0.0058
3.8062 22000 0.0058
3.8927 22500 0.0051
3.9792 23000 0.005
4.0 23120 -
4.0657 23500 0.0035
4.1522 24000 0.0027
4.2388 24500 0.0028
4.3253 25000 0.0025
4.4118 25500 0.0025
4.4983 26000 0.0024

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.11.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mercity/memory-retrieval-qwen3-0.6b-lora

Finetuned
(78)
this model