Omni-ASR CTC CoreML Models

CoreML-optimized versions of Meta's Omni-ASR CTC models for on-device speech recognition on Apple platforms (iOS 17+, macOS 14+).

These models run entirely on-device using Apple's Neural Engine (ANE), with no cloud dependency.

Available Models

Model Parameters Precision Size Recommended
OmniASR_CTC_300M_int8 300M INT8 312 MB Yes
OmniASR_CTC_300M_fp16 300M FP16 621 MB
OmniASR_CTC_1B_int8 1B INT8 933 MB
OmniASR_CTC_1B_fp16 1B FP16 1.8 GB

The 300M INT8 variant offers the best trade-off between accuracy and latency for real-time use on iPhone.

Architecture

  • Backbone: wav2vec2 Conformer encoder (fairseq2)
  • Head: CTC (Connectionist Temporal Classification)
  • Feature extractor: Convolutional, stride 320 (20ms per frame at 16kHz)
  • Vocabulary: 9,813 multilingual SentencePiece tokens (shared across all variants)
  • Training: Dynamic Chunk Training with ~10% full-context passes

Input / Output

Description
Input audio: Float16 MultiArray [1, T] β€” raw 16kHz mono audio samples
Output logits: Float16 MultiArray [1, T/320, 9813] β€” CTC log-probabilities

Supported input lengths (enumerated shapes):

  • [1, 160000] β€” 10 seconds
  • [1, 320000] β€” 20 seconds
  • [1, 640000] β€” 40 seconds

Shorter audio is zero-padded to the nearest shape; the CTC decoder trims to actual length.

Performance (iPhone 15 Pro, ANE)

Model 4s audio 20s audio 40s audio
300M INT8 ~100 ms ~500 ms ~1.2 s
1B INT8 ~300 ms ~1.5 s ~3.5 s

Usage

Download a model

pip install huggingface_hub
# Download 300M INT8 (recommended)
huggingface-cli download ChipCracker/omni-asr-coreml \
    OmniASR_CTC_300M_int8.mlmodelc --local-dir ./models

Load in Swift

import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine

let model = try await MLModel.load(
    contentsOf: modelURL,
    configuration: config
)

Decode with greedy CTC

// After model.prediction(from: features):
// 1. Argmax over vocabulary dimension
// 2. Remove consecutive duplicates
// 3. Remove blank token (index 0)
// 4. Map indices to vocabulary tokens
// 5. Join and replace SentencePiece boundary (▁) with space

iOS App

These models are used by the omni-asr iOS app which provides:

  • Live transcription with growing context
  • On-demand model download from this repository
  • Full offline operation after download

Export

Models were exported from PyTorch using coremltools 9.0:

omni-asr-export \
    --model-card omniASR_CTC_300M \
    --output OmniASR_CTC_300M_int8.mlpackage
# INT8 quantization is applied by default

INT8 variants use post-training linear symmetric weight quantization, reducing size ~2x with minimal accuracy loss.

File Structure

Each .mlmodelc directory contains:

OmniASR_CTC_300M_int8.mlmodelc/
β”œβ”€β”€ coremldata.bin          # Model graph serialization
β”œβ”€β”€ metadata.json           # CoreML metadata
β”œβ”€β”€ model.mil               # ML Intermediate Language
β”œβ”€β”€ analytics/coremldata.bin
└── weights/weight.bin      # Model weights (largest file)

Citation

@article{pratap2023scaling,
    title={Scaling Speech Technology to 1,000+ Languages},
    author={Pratap, Vineel and others},
    journal={arXiv preprint arXiv:2305.13516},
    year={2023}
}

License

The CoreML conversion and app code are provided under CC-BY-NC-4.0. The original Omni-ASR model weights are subject to Meta's license terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ChipCracker/omni-asr-coreml