Breeze ASR 25 — WhisperKit CoreML (4-bit Palettized)

The first CoreML conversion of MediaTek's Breeze ASR 25, optimized for on-device inference on Apple devices via WhisperKit.

Breeze ASR 25 is fine-tuned from Whisper large-v2 with significantly better performance on Taiwanese Mandarin and Mandarin-English code-switching — up to 56% lower WER compared to the original Whisper large-v2.

This repo provides the 4-bit palettized CoreML version with outlier decomposition, ready for iPhone, iPad, Mac, and potentially Vision Pro.

Model Details

Property Value
Base Model MediaTek-Research/Breeze-ASR-25 (Whisper large-v2)
Parameters 1.55B
Compression 4-bit palettization + outlier decomposition
Conversion Tool whisperkittools + coremltools
Compute Unit CPU + Apple Neural Engine (ANE)
Minimum iOS iOS 16+

File Sizes

Component FP16 4-bit Compression
AudioEncoder 1.2 GB 410 MB 3x
TextDecoder 1.7 GB 638 MB 2.7x
MelSpectrogram 372 KB 372 KB
Total ~2.9 GB ~1.05 GB ~2.8x

Performance (from Breeze ASR 25 paper)

Short-form Audio

Dataset Whisper large-v2 WER ↓ Breeze ASR 25 WER ↓ Improvement
ASCEND-MIX (code-switching) 21.01 16.38 -22%
CommonVoice16-zh-TW 9.84 7.97 -19%
CSZS-zh-en (code-switching) 29.49 13.01 -56%

Long-form Audio

Dataset Whisper large-v2 WER ↓ Breeze ASR 25 WER ↓ Improvement
ML-lecture-2021-long 6.13 4.98 -19%
Formosa-Go 15.03 13.61 -9%
FormosaSpeech 22.34 22.09 -1%

Usage with WhisperKit

Swift (iOS / macOS / visionOS)

import WhisperKit

// Download and extract model files to a local directory
let config = WhisperKitConfig(
    modelFolder: "/path/to/breeze-asr-25-whisperkit-coreml",
    download: false
)
let whisperKit = try await WhisperKit(config)
let result = try await whisperKit.transcribe(audioPath: "audio.wav")
print(result.text)

Required Files

Place these files in your model folder:

breeze-asr-25-whisperkit-coreml/
├── AudioEncoder.mlmodelc/
├── TextDecoder.mlmodelc/
└── MelSpectrogram.mlmodelc/

Conversion Details

  • Method: 4-bit palettization via whisperkit-generate-model
  • Outlier Decomposition: Enabled — outlier weights preserved at higher precision for quality
  • PSNR: AudioEncoder torch2coreml = 58.6 dB (excellent)
  • Conversion Time: ~4.5 hours on M1 Max (single-threaded CoreML compilation)

Conversion Command

whisperkit-generate-model \
    --model-version MediaTek-Research/Breeze-ASR-25 \
    --output-dir ./output \
    --generate-quantized-variants \
    --allowed-nbits 4 \
    --force-recipe-nbits \
    --outlier-decomp \
    --generate-decoder-context-prefill-data

Expected Device Performance

Device Estimated Latency (30s audio) Notes
iPhone 15 Pro+ (A17+) ~3-5s ANE accelerated
iPhone 14 Pro (A16) ~4-6s ANE accelerated
Mac M1+ ~2-4s Full ANE + GPU
Vision Pro (M2) ~3-5s Needs verification

Limitations

  • Not real-time: At ~1 GB, this model is best for batch transcription, not live dictation. For real-time keyboard use, consider smaller models (whisper-base or whisper-small).
  • First load time: ANE compilation on first use takes ~10-12 minutes per component. Subsequent loads are cached.
  • visionOS: CoreML is available on visionOS but WhisperKit visionOS support needs verification.

License

This model is released under the Apache 2.0 License, same as the original Breeze ASR 25 model.

Credits

Citation

If you use this model, please cite the original Breeze ASR 25 paper:

@article{chou2025selfrefiningframeworkenhancingasr,
  title={A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data},
  author={Cheng Kang Chou and Chan-Jan Hsu and Ho-Lam Chung and Liang-Hsuan Tseng and Hsi-Chun Cheng and Yu-Kuan Fu and Kuan Po Huang and Hung-Yi Lee},
  journal={arXiv preprint arXiv:2506.11130},
  year={2025}
}

Breeze ASR 25 — WhisperKit CoreML(4-bit 量化版)

首個 MediaTek Breeze ASR 25 的 CoreML 轉換版本,針對 Apple 裝置上透過 WhisperKit 進行本地推理優化。

Breeze ASR 25 基於 Whisper large-v2 微調,在繁體中文中英混用場景的表現大幅超越原始模型 — WER 最高降低 **56%**。

本 repo 提供 4-bit palettization + outlier decomposition 的 CoreML 版本,可在 iPhone、iPad、Mac 及 Vision Pro 上直接使用。

模型資訊

屬性
基礎模型 MediaTek-Research/Breeze-ASR-25(Whisper large-v2)
參數量 15.5 億
壓縮方式 4-bit palettization + outlier decomposition
轉換工具 whisperkittools + coremltools
運算單元 CPU + Apple Neural Engine (ANE)
最低 iOS 版本 iOS 16+

檔案大小

元件 FP16 4-bit 壓縮比
AudioEncoder 1.2 GB 410 MB 3x
TextDecoder 1.7 GB 638 MB 2.7x
MelSpectrogram 372 KB 372 KB
合計 ~2.9 GB ~1.05 GB ~2.8x

使用方式

Swift(iOS / macOS / visionOS)

import WhisperKit

// 下載並解壓模型檔案到本地目錄
let config = WhisperKitConfig(
    modelFolder: "/path/to/breeze-asr-25-whisperkit-coreml",
    download: false
)
let whisperKit = try await WhisperKit(config)
let result = try await whisperKit.transcribe(audioPath: "audio.wav")
print(result.text)

裝置效能預估

裝置 預估延遲(30 秒音訊) 備註
iPhone 15 Pro+(A17+) ~3-5 秒 ANE 加速
iPhone 14 Pro(A16) ~4-6 秒 ANE 加速
Mac M1+ ~2-4 秒 完整 ANE + GPU
Vision Pro(M2) ~3-5 秒 需實測驗證

限制

  • 非即時辨識:模型約 1 GB,適合批次轉錄而非即時聽寫。即時場景建議使用較小模型(whisper-base 或 whisper-small)。
  • 首次載入時間:ANE 編譯首次使用時每個元件約需 10-12 分鐘,後續使用會快取。
  • visionOS:CoreML 在 visionOS 可用,但 WhisperKit 的 visionOS 支援仍需驗證。

授權

本模型採用 Apache 2.0 授權,與原始 Breeze ASR 25 模型相同。

Downloads last month
71
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fredchu/breeze-asr-25-whisperkit-coreml

Finetuned
(14)
this model

Paper for fredchu/breeze-asr-25-whisperkit-coreml