Omni-ASR CTC CoreML Models

CoreML-optimized versions of Meta's Omni-ASR CTC models for on-device speech recognition on Apple platforms (iOS 17+, macOS 14+).

These models run entirely on-device using Apple's Neural Engine (ANE), with no cloud dependency.

Available Models

Model	Parameters	Precision	Size	Recommended
`OmniASR_CTC_300M_int8`	300M	INT8	312 MB	Yes
`OmniASR_CTC_300M_fp16`	300M	FP16	621 MB
`OmniASR_CTC_1B_int8`	1B	INT8	933 MB
`OmniASR_CTC_1B_fp16`	1B	FP16	1.8 GB

The 300M INT8 variant offers the best trade-off between accuracy and latency for real-time use on iPhone.

Architecture

Backbone: wav2vec2 Conformer encoder (fairseq2)
Head: CTC (Connectionist Temporal Classification)
Feature extractor: Convolutional, stride 320 (20ms per frame at 16kHz)
Vocabulary: 9,813 multilingual SentencePiece tokens (shared across all variants)
Training: Dynamic Chunk Training with ~10% full-context passes

Input / Output

	Description
Input	`audio`: Float16 MultiArray `[1, T]` — raw 16kHz mono audio samples
Output	`logits`: Float16 MultiArray `[1, T/320, 9813]` — CTC log-probabilities

Supported input lengths (enumerated shapes):

[1, 160000] — 10 seconds
[1, 320000] — 20 seconds
[1, 640000] — 40 seconds

Shorter audio is zero-padded to the nearest shape; the CTC decoder trims to actual length.

Performance (iPhone 15 Pro, ANE)

Model	4s audio	20s audio	40s audio
300M INT8	~100 ms	~500 ms	~1.2 s
1B INT8	~300 ms	~1.5 s	~3.5 s

Usage

Download a model

pip install huggingface_hub
# Download 300M INT8 (recommended)
huggingface-cli download ChipCracker/omni-asr-coreml \
    OmniASR_CTC_300M_int8.mlmodelc --local-dir ./models

Load in Swift

import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine

let model = try await MLModel.load(
    contentsOf: modelURL,
    configuration: config
)

Decode with greedy CTC

// After model.prediction(from: features):
// 1. Argmax over vocabulary dimension
// 2. Remove consecutive duplicates
// 3. Remove blank token (index 0)
// 4. Map indices to vocabulary tokens
// 5. Join and replace SentencePiece boundary (▁) with space

iOS App

These models are used by the omni-asr iOS app which provides:

Live transcription with growing context
On-demand model download from this repository
Full offline operation after download

Export

Models were exported from PyTorch using coremltools 9.0:

omni-asr-export \
    --model-card omniASR_CTC_300M \
    --output OmniASR_CTC_300M_int8.mlpackage
# INT8 quantization is applied by default

INT8 variants use post-training linear symmetric weight quantization, reducing size ~2x with minimal accuracy loss.

File Structure

Each .mlmodelc directory contains:

OmniASR_CTC_300M_int8.mlmodelc/
├── coremldata.bin          # Model graph serialization
├── metadata.json           # CoreML metadata
├── model.mil               # ML Intermediate Language
├── analytics/coremldata.bin
└── weights/weight.bin      # Model weights (largest file)

Citation

@article{pratap2023scaling,
    title={Scaling Speech Technology to 1,000+ Languages},
    author={Pratap, Vineel and others},
    journal={arXiv preprint arXiv:2305.13516},
    year={2023}
}

License

The CoreML conversion and app code are provided under CC-BY-NC-4.0. The original Omni-ASR model weights are subject to Meta's license terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for ChipCracker/omni-asr-coreml

Scaling Speech Technology to 1,000+ Languages

Paper • 2305.13516 • Published May 22, 2023 • 12