Breeze ASR 25 — WhisperKit CoreML (4-bit Palettized)
The first CoreML conversion of MediaTek's Breeze ASR 25, optimized for on-device inference on Apple devices via WhisperKit.
Breeze ASR 25 is fine-tuned from Whisper large-v2 with significantly better performance on Taiwanese Mandarin and Mandarin-English code-switching — up to 56% lower WER compared to the original Whisper large-v2.
This repo provides the 4-bit palettized CoreML version with outlier decomposition, ready for iPhone, iPad, Mac, and potentially Vision Pro.
Model Details
| Property | Value |
|---|---|
| Base Model | MediaTek-Research/Breeze-ASR-25 (Whisper large-v2) |
| Parameters | 1.55B |
| Compression | 4-bit palettization + outlier decomposition |
| Conversion Tool | whisperkittools + coremltools |
| Compute Unit | CPU + Apple Neural Engine (ANE) |
| Minimum iOS | iOS 16+ |
File Sizes
| Component | FP16 | 4-bit | Compression |
|---|---|---|---|
| AudioEncoder | 1.2 GB | 410 MB | 3x |
| TextDecoder | 1.7 GB | 638 MB | 2.7x |
| MelSpectrogram | 372 KB | 372 KB | — |
| Total | ~2.9 GB | ~1.05 GB | ~2.8x |
Performance (from Breeze ASR 25 paper)
Short-form Audio
| Dataset | Whisper large-v2 WER ↓ | Breeze ASR 25 WER ↓ | Improvement |
|---|---|---|---|
| ASCEND-MIX (code-switching) | 21.01 | 16.38 | -22% |
| CommonVoice16-zh-TW | 9.84 | 7.97 | -19% |
| CSZS-zh-en (code-switching) | 29.49 | 13.01 | -56% |
Long-form Audio
| Dataset | Whisper large-v2 WER ↓ | Breeze ASR 25 WER ↓ | Improvement |
|---|---|---|---|
| ML-lecture-2021-long | 6.13 | 4.98 | -19% |
| Formosa-Go | 15.03 | 13.61 | -9% |
| FormosaSpeech | 22.34 | 22.09 | -1% |
Usage with WhisperKit
Swift (iOS / macOS / visionOS)
import WhisperKit
// Download and extract model files to a local directory
let config = WhisperKitConfig(
modelFolder: "/path/to/breeze-asr-25-whisperkit-coreml",
download: false
)
let whisperKit = try await WhisperKit(config)
let result = try await whisperKit.transcribe(audioPath: "audio.wav")
print(result.text)
Required Files
Place these files in your model folder:
breeze-asr-25-whisperkit-coreml/
├── AudioEncoder.mlmodelc/
├── TextDecoder.mlmodelc/
└── MelSpectrogram.mlmodelc/
Conversion Details
- Method: 4-bit palettization via
whisperkit-generate-model - Outlier Decomposition: Enabled — outlier weights preserved at higher precision for quality
- PSNR: AudioEncoder torch2coreml = 58.6 dB (excellent)
- Conversion Time: ~4.5 hours on M1 Max (single-threaded CoreML compilation)
Conversion Command
whisperkit-generate-model \
--model-version MediaTek-Research/Breeze-ASR-25 \
--output-dir ./output \
--generate-quantized-variants \
--allowed-nbits 4 \
--force-recipe-nbits \
--outlier-decomp \
--generate-decoder-context-prefill-data
Expected Device Performance
| Device | Estimated Latency (30s audio) | Notes |
|---|---|---|
| iPhone 15 Pro+ (A17+) | ~3-5s | ANE accelerated |
| iPhone 14 Pro (A16) | ~4-6s | ANE accelerated |
| Mac M1+ | ~2-4s | Full ANE + GPU |
| Vision Pro (M2) | ~3-5s | Needs verification |
Limitations
- Not real-time: At ~1 GB, this model is best for batch transcription, not live dictation. For real-time keyboard use, consider smaller models (whisper-base or whisper-small).
- First load time: ANE compilation on first use takes ~10-12 minutes per component. Subsequent loads are cached.
- visionOS: CoreML is available on visionOS but WhisperKit visionOS support needs verification.
License
This model is released under the Apache 2.0 License, same as the original Breeze ASR 25 model.
Credits
- MediaTek Research for Breeze ASR 25
- Argmax (WhisperKit) for whisperkittools and the on-device ASR framework
- Apple for coremltools
Citation
If you use this model, please cite the original Breeze ASR 25 paper:
@article{chou2025selfrefiningframeworkenhancingasr,
title={A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data},
author={Cheng Kang Chou and Chan-Jan Hsu and Ho-Lam Chung and Liang-Hsuan Tseng and Hsi-Chun Cheng and Yu-Kuan Fu and Kuan Po Huang and Hung-Yi Lee},
journal={arXiv preprint arXiv:2506.11130},
year={2025}
}
Breeze ASR 25 — WhisperKit CoreML(4-bit 量化版)
首個 MediaTek Breeze ASR 25 的 CoreML 轉換版本,針對 Apple 裝置上透過 WhisperKit 進行本地推理優化。
Breeze ASR 25 基於 Whisper large-v2 微調,在繁體中文和中英混用場景的表現大幅超越原始模型 — WER 最高降低 **56%**。
本 repo 提供 4-bit palettization + outlier decomposition 的 CoreML 版本,可在 iPhone、iPad、Mac 及 Vision Pro 上直接使用。
模型資訊
| 屬性 | 值 |
|---|---|
| 基礎模型 | MediaTek-Research/Breeze-ASR-25(Whisper large-v2) |
| 參數量 | 15.5 億 |
| 壓縮方式 | 4-bit palettization + outlier decomposition |
| 轉換工具 | whisperkittools + coremltools |
| 運算單元 | CPU + Apple Neural Engine (ANE) |
| 最低 iOS 版本 | iOS 16+ |
檔案大小
| 元件 | FP16 | 4-bit | 壓縮比 |
|---|---|---|---|
| AudioEncoder | 1.2 GB | 410 MB | 3x |
| TextDecoder | 1.7 GB | 638 MB | 2.7x |
| MelSpectrogram | 372 KB | 372 KB | — |
| 合計 | ~2.9 GB | ~1.05 GB | ~2.8x |
使用方式
Swift(iOS / macOS / visionOS)
import WhisperKit
// 下載並解壓模型檔案到本地目錄
let config = WhisperKitConfig(
modelFolder: "/path/to/breeze-asr-25-whisperkit-coreml",
download: false
)
let whisperKit = try await WhisperKit(config)
let result = try await whisperKit.transcribe(audioPath: "audio.wav")
print(result.text)
裝置效能預估
| 裝置 | 預估延遲(30 秒音訊) | 備註 |
|---|---|---|
| iPhone 15 Pro+(A17+) | ~3-5 秒 | ANE 加速 |
| iPhone 14 Pro(A16) | ~4-6 秒 | ANE 加速 |
| Mac M1+ | ~2-4 秒 | 完整 ANE + GPU |
| Vision Pro(M2) | ~3-5 秒 | 需實測驗證 |
限制
- 非即時辨識:模型約 1 GB,適合批次轉錄而非即時聽寫。即時場景建議使用較小模型(whisper-base 或 whisper-small)。
- 首次載入時間:ANE 編譯首次使用時每個元件約需 10-12 分鐘,後續使用會快取。
- visionOS:CoreML 在 visionOS 可用,但 WhisperKit 的 visionOS 支援仍需驗證。
授權
本模型採用 Apache 2.0 授權,與原始 Breeze ASR 25 模型相同。
- Downloads last month
- 71
Model tree for fredchu/breeze-asr-25-whisperkit-coreml
Base model
openai/whisper-large-v2