Qwen3-Embedding-4B CoreML Models

Pre-converted CoreML models for Apple Silicon (M1/M2/M3) optimized for the Papr Memory Python SDK.

Available Variants

  • FP16 (Recommended): 7.5GB, ~70ms inference, <0.1% accuracy loss
  • INT8: 4GB, ~100-150ms inference, ~1-2% accuracy loss

Performance

Variant Size Latency Accuracy Hardware
FP16 7.5GB 70ms 99.99% ANE + GPU
INT8 4GB 100-150ms 98-99% GPU

Usage

from papr_memory import Papr
import os

# Download model (one-time)
from huggingface_hub import snapshot_download
model_path = snapshot_download(
    repo_id="papr-ai/Qwen3-Embedding-4B-CoreML",
    allow_patterns=["fp16/*"],
    local_dir="./coreml"
)

# Configure environment
os.environ["PAPR_ENABLE_COREML"] = "true"
os.environ["PAPR_COREML_MODEL"] = "./coreml/fp16"

# Use in SDK
client = Papr(x_api_key="your_key")
results = client.memory.search(query="test", max_memories=10)

Manual Download

# FP16 (recommended)
huggingface-cli download papr-ai/Qwen3-Embedding-4B-CoreML --local-dir ./coreml fp16/

# INT8 (smaller, slightly slower)
huggingface-cli download papr-ai/Qwen3-Embedding-4B-CoreML --local-dir ./coreml int8/

Build Yourself

Alternatively, build from source:

pip install coremltools transformers torch
python scripts/coreml_models/convert_qwen_coreml.py \
  --hf Qwen/Qwen3-Embedding-4B \
  --out ./coreml/model.mlpackage \
  --fp16

Citation

@software{qwen3_coreml,
  title = {Qwen3-Embedding-4B CoreML},
  author = {Papr AI},
  year = {2025},
  url = {https://huggingface.co/papr-ai/Qwen3-Embedding-4B-CoreML}
}

License

Apache 2.0 (same as base Qwen3 model)

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support