Automatic Speech Recognition for Shona

Hugging Face Hugging Face License

Model Description 🫐

This model is a fine-tuned version of Wav2Vec2-BERT 2.0 for Shona automatic speech recognition (ASR). It was trained on 72 hours of transcribed Shona speech. The ASR model is robust and the in-domain WER is below 23%.

  • Developed by: Badr al-Absi
  • Model type: Speech Recognition (ASR)
  • Language: Shona (sn)
  • License: CC-BY-4.0
  • Finetuned from: facebook/w2v-bert-2.0

Direct Use

The model can be used directly for automatic speech recognition of a Shona audio:

from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC
import torch
import torchaudio

# load model and processor
processor = Wav2Vec2BertProcessor.from_pretrained("badrex/w2v-bert-2.0-shona-asr")
model = Wav2Vec2BertForCTC.from_pretrained("badrex/w2v-bert-2.0-shona-asr")

# load audio
audio_input, sample_rate = torchaudio.load("path/to/audio.wav")

# preprocess
inputs = processor(audio_input.squeeze(), sampling_rate=sample_rate, return_tensors="pt")

# inference
with torch.no_grad():
    logits = model(**inputs).logits

# decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)

Downstream Use

This model can be used as a foundation for:

  • building voice assistants for Shona speakers
  • transcription services for Shona content
  • accessibility tools for Shona-speaking communities
  • research in low-resource speech recognition

Model Architecture

  • Base model: Wav2Vec2-BERT 2.0
  • Architecture: transformer-based with convolutional feature extractor
  • Parameters: ~600M (inherited from base model)
  • Objective: connectionist temporal classification (CTC)

Funding

The development of this model was supported by CLEAR Global and Gates Foundation.

Citation

@misc{w2v_bert_shona_asr,
  author = {Badr M. Abdullah},
  title = {Adapting Wav2Vec2-BERT 2.0 for Shona ASR},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/badrex/w2v-bert-2.0-shona-asr}
}

Model Card Contact

For questions or issues, please contact via the Hugging Face model repository in the community discussion section.

Downloads last month
35
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for badrex/w2v-bert-2.0-shona-asr

Finetuned
(377)
this model

Dataset used to train badrex/w2v-bert-2.0-shona-asr

Collection including badrex/w2v-bert-2.0-shona-asr