Qwen3-Embedding-4B-ONNX

This is an ONNX conversion of Qwen/Qwen3-Embedding-4B for use with Transformers.js in the browser.

Model Details

Model Type: Text Embedding
Base Model: Qwen3-Embedding-4B
Parameters: 4B
Embedding Dimensions: 2560
Context Length: 32K
MTEB v2 Score: 74.60
Languages: 100+

Usage (Transformers.js v3)

import { pipeline } from "@huggingface/transformers";

// Create a feature extraction pipeline
const extractor = await pipeline(
  "feature-extraction",
  "dssjon/Qwen3-Embedding-4B-ONNX",
  {
    dtype: "fp32",
    device: "webgpu", // Use WebGPU for acceleration
  }
);

// Format query with instruction
const taskDescription = "Given a web search query, retrieve relevant passages that answer the query";
const query = `Instruct: ${taskDescription}\nQuery:What is the capital of China?`;

// Generate embedding
const output = await extractor(query, {
  pooling: "last_token",
  normalize: true
});

console.log(output.data); // 2560-dimensional embedding

Usage (Python - Original Model)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Qwen/Qwen3-Embedding-4B")

# For queries
query = "What is the capital of China?"
query_embedding = model.encode(query, prompt_name="query")

# For documents (no prompt needed)
document = "The capital of China is Beijing."
doc_embedding = model.encode(document)

Conversion Details

ONNX Opset: 14
Precision: FP32
Optimization: None (Qwen3 not yet supported by ONNX Runtime optimizer)
File Size: ~15.3 GB

Performance

Benchmark scores from MTEB v2:

Task	Score
Classification	89.84
Clustering	57.51
Pair Classification	87.01
Reranking	50.76
Retrieval	68.46
STS	88.72
Summarization	34.39
Mean	74.60

License

Apache 2.0 (same as base model)

Citation

@article{qwen3embedding2025,
  title={Qwen3 Embedding},
  author={Qwen Team},
  year={2025},
  url={https://huggingface.co/Qwen/Qwen3-Embedding-4B}
}

Acknowledgments

Base model: Qwen/Qwen3-Embedding-4B
Conversion tool: Optimum
Browser runtime: Transformers.js

Downloads last month: 27