Qwen3-Embedding-4B-ONNX

This is an ONNX conversion of Qwen/Qwen3-Embedding-4B for use with Transformers.js in the browser.

Model Details

  • Model Type: Text Embedding
  • Base Model: Qwen3-Embedding-4B
  • Parameters: 4B
  • Embedding Dimensions: 2560
  • Context Length: 32K
  • MTEB v2 Score: 74.60
  • Languages: 100+

Usage (Transformers.js v3)

import { pipeline } from "@huggingface/transformers";

// Create a feature extraction pipeline
const extractor = await pipeline(
  "feature-extraction",
  "dssjon/Qwen3-Embedding-4B-ONNX",
  {
    dtype: "fp32",
    device: "webgpu", // Use WebGPU for acceleration
  }
);

// Format query with instruction
const taskDescription = "Given a web search query, retrieve relevant passages that answer the query";
const query = `Instruct: ${taskDescription}\nQuery:What is the capital of China?`;

// Generate embedding
const output = await extractor(query, {
  pooling: "last_token",
  normalize: true
});

console.log(output.data); // 2560-dimensional embedding

Usage (Python - Original Model)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Qwen/Qwen3-Embedding-4B")

# For queries
query = "What is the capital of China?"
query_embedding = model.encode(query, prompt_name="query")

# For documents (no prompt needed)
document = "The capital of China is Beijing."
doc_embedding = model.encode(document)

Conversion Details

  • ONNX Opset: 14
  • Precision: FP32
  • Optimization: None (Qwen3 not yet supported by ONNX Runtime optimizer)
  • File Size: ~15.3 GB

Performance

Benchmark scores from MTEB v2:

Task Score
Classification 89.84
Clustering 57.51
Pair Classification 87.01
Reranking 50.76
Retrieval 68.46
STS 88.72
Summarization 34.39
Mean 74.60

License

Apache 2.0 (same as base model)

Citation

@article{qwen3embedding2025,
  title={Qwen3 Embedding},
  author={Qwen Team},
  year={2025},
  url={https://huggingface.co/Qwen/Qwen3-Embedding-4B}
}

Acknowledgments

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support