Breeze ASR 25 — WhisperKit CoreML (4-bit Palettized)

The first CoreML conversion of MediaTek's Breeze ASR 25, optimized for on-device inference on Apple devices via WhisperKit.

Breeze ASR 25 is fine-tuned from Whisper large-v2 with significantly better performance on Taiwanese Mandarin and Mandarin-English code-switching — up to 56% lower WER compared to the original Whisper large-v2.

This repo provides the 4-bit palettized CoreML version with outlier decomposition, ready for iPhone, iPad, Mac, and potentially Vision Pro.

Model Details

Property	Value
Base Model	MediaTek-Research/Breeze-ASR-25 (Whisper large-v2)
Parameters	1.55B
Compression	4-bit palettization + outlier decomposition
Conversion Tool	whisperkittools + coremltools
Compute Unit	CPU + Apple Neural Engine (ANE)
Minimum iOS	iOS 16+

File Sizes

Component	FP16	4-bit	Compression
AudioEncoder	1.2 GB	410 MB	3x
TextDecoder	1.7 GB	638 MB	2.7x
MelSpectrogram	372 KB	372 KB	—
Total	~2.9 GB	~1.05 GB	~2.8x

Performance (from Breeze ASR 25 paper)

Short-form Audio

Dataset	Whisper large-v2 WER ↓	Breeze ASR 25 WER ↓	Improvement
ASCEND-MIX (code-switching)	21.01	16.38	-22%
CommonVoice16-zh-TW	9.84	7.97	-19%
CSZS-zh-en (code-switching)	29.49	13.01	-56%

Long-form Audio

Dataset	Whisper large-v2 WER ↓	Breeze ASR 25 WER ↓	Improvement
ML-lecture-2021-long	6.13	4.98	-19%
Formosa-Go	15.03	13.61	-9%
FormosaSpeech	22.34	22.09	-1%

Usage with WhisperKit

Swift (iOS / macOS / visionOS)

import WhisperKit

// Download and extract model files to a local directory
let config = WhisperKitConfig(
    modelFolder: "/path/to/breeze-asr-25-whisperkit-coreml",
    download: false
)
let whisperKit = try await WhisperKit(config)
let result = try await whisperKit.transcribe(audioPath: "audio.wav")
print(result.text)

Required Files

Place these files in your model folder:

breeze-asr-25-whisperkit-coreml/
├── AudioEncoder.mlmodelc/
├── TextDecoder.mlmodelc/
└── MelSpectrogram.mlmodelc/

Conversion Details

Method: 4-bit palettization via whisperkit-generate-model
Outlier Decomposition: Enabled — outlier weights preserved at higher precision for quality
PSNR: AudioEncoder torch2coreml = 58.6 dB (excellent)
Conversion Time: ~4.5 hours on M1 Max (single-threaded CoreML compilation)

Conversion Command

whisperkit-generate-model \
    --model-version MediaTek-Research/Breeze-ASR-25 \
    --output-dir ./output \
    --generate-quantized-variants \
    --allowed-nbits 4 \
    --force-recipe-nbits \
    --outlier-decomp \
    --generate-decoder-context-prefill-data

Expected Device Performance

Device	Estimated Latency (30s audio)	Notes
iPhone 15 Pro+ (A17+)	~3-5s	ANE accelerated
iPhone 14 Pro (A16)	~4-6s	ANE accelerated
Mac M1+	~2-4s	Full ANE + GPU
Vision Pro (M2)	~3-5s	Needs verification

Limitations

Not real-time: At ~1 GB, this model is best for batch transcription, not live dictation. For real-time keyboard use, consider smaller models (whisper-base or whisper-small).
First load time: ANE compilation on first use takes ~10-12 minutes per component. Subsequent loads are cached.
visionOS: CoreML is available on visionOS but WhisperKit visionOS support needs verification.

License

This model is released under the Apache 2.0 License, same as the original Breeze ASR 25 model.

Credits

MediaTek Research for Breeze ASR 25
Argmax (WhisperKit) for whisperkittools and the on-device ASR framework
Apple for coremltools

Citation

If you use this model, please cite the original Breeze ASR 25 paper:

@article{chou2025selfrefiningframeworkenhancingasr,
  title={A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data},
  author={Cheng Kang Chou and Chan-Jan Hsu and Ho-Lam Chung and Liang-Hsuan Tseng and Hsi-Chun Cheng and Yu-Kuan Fu and Kuan Po Huang and Hung-Yi Lee},
  journal={arXiv preprint arXiv:2506.11130},
  year={2025}
}

Breeze ASR 25 — WhisperKit CoreML（4-bit 量化版）

首個 MediaTek Breeze ASR 25 的 CoreML 轉換版本，針對 Apple 裝置上透過 WhisperKit 進行本地推理優化。

Breeze ASR 25 基於 Whisper large-v2 微調，在繁體中文和中英混用場景的表現大幅超越原始模型 — WER 最高降低 **56%**。

本 repo 提供 4-bit palettization + outlier decomposition 的 CoreML 版本，可在 iPhone、iPad、Mac 及 Vision Pro 上直接使用。

模型資訊

屬性	值
基礎模型	MediaTek-Research/Breeze-ASR-25（Whisper large-v2）
參數量	15.5 億
壓縮方式	4-bit palettization + outlier decomposition
轉換工具	whisperkittools + coremltools
運算單元	CPU + Apple Neural Engine (ANE)
最低 iOS 版本	iOS 16+

檔案大小

元件	FP16	4-bit	壓縮比
AudioEncoder	1.2 GB	410 MB	3x
TextDecoder	1.7 GB	638 MB	2.7x
MelSpectrogram	372 KB	372 KB	—
合計	~2.9 GB	~1.05 GB	~2.8x

使用方式

Swift（iOS / macOS / visionOS）

import WhisperKit

// 下載並解壓模型檔案到本地目錄
let config = WhisperKitConfig(
    modelFolder: "/path/to/breeze-asr-25-whisperkit-coreml",
    download: false
)
let whisperKit = try await WhisperKit(config)
let result = try await whisperKit.transcribe(audioPath: "audio.wav")
print(result.text)

裝置效能預估

裝置	預估延遲（30 秒音訊）	備註
iPhone 15 Pro+（A17+）	~3-5 秒	ANE 加速
iPhone 14 Pro（A16）	~4-6 秒	ANE 加速
Mac M1+	~2-4 秒	完整 ANE + GPU
Vision Pro（M2）	~3-5 秒	需實測驗證

限制

非即時辨識：模型約 1 GB，適合批次轉錄而非即時聽寫。即時場景建議使用較小模型（whisper-base 或 whisper-small）。
首次載入時間：ANE 編譯首次使用時每個元件約需 10-12 分鐘，後續使用會快取。
visionOS：CoreML 在 visionOS 可用，但 WhisperKit 的 visionOS 支援仍需驗證。

授權

本模型採用 Apache 2.0 授權，與原始 Breeze ASR 25 模型相同。

Downloads last month: 71

Model tree for fredchu/breeze-asr-25-whisperkit-coreml

Base model

openai/whisper-large-v2

Finetuned

MediaTek-Research/Breeze-ASR-25

Finetuned

(14)

this model

Paper for fredchu/breeze-asr-25-whisperkit-coreml

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

Paper • 2506.11130 • Published Jun 10, 2025 • 5