Qwen3-Embedding-4B CoreML Models
Pre-converted CoreML models for Apple Silicon (M1/M2/M3) optimized for the Papr Memory Python SDK.
Available Variants
- FP16 (Recommended): 7.5GB, ~70ms inference, <0.1% accuracy loss
- INT8: 4GB, ~100-150ms inference, ~1-2% accuracy loss
Performance
| Variant | Size | Latency | Accuracy | Hardware |
|---|---|---|---|---|
| FP16 | 7.5GB | 70ms | 99.99% | ANE + GPU |
| INT8 | 4GB | 100-150ms | 98-99% | GPU |
Usage
from papr_memory import Papr
import os
# Download model (one-time)
from huggingface_hub import snapshot_download
model_path = snapshot_download(
repo_id="papr-ai/Qwen3-Embedding-4B-CoreML",
allow_patterns=["fp16/*"],
local_dir="./coreml"
)
# Configure environment
os.environ["PAPR_ENABLE_COREML"] = "true"
os.environ["PAPR_COREML_MODEL"] = "./coreml/fp16"
# Use in SDK
client = Papr(x_api_key="your_key")
results = client.memory.search(query="test", max_memories=10)
Manual Download
# FP16 (recommended)
huggingface-cli download papr-ai/Qwen3-Embedding-4B-CoreML --local-dir ./coreml fp16/
# INT8 (smaller, slightly slower)
huggingface-cli download papr-ai/Qwen3-Embedding-4B-CoreML --local-dir ./coreml int8/
Build Yourself
Alternatively, build from source:
pip install coremltools transformers torch
python scripts/coreml_models/convert_qwen_coreml.py \
--hf Qwen/Qwen3-Embedding-4B \
--out ./coreml/model.mlpackage \
--fp16
Citation
@software{qwen3_coreml,
title = {Qwen3-Embedding-4B CoreML},
author = {Papr AI},
year = {2025},
url = {https://huggingface.co/papr-ai/Qwen3-Embedding-4B-CoreML}
}
License
Apache 2.0 (same as base Qwen3 model)
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support