Audio Classification
Safetensors
whisper
Language Recognition

IberLang Classifier

Model Overview

The IberLang classifier is a fine-tuned version of the Whisper Medium model, developed specifically for language recognition across the Iberian linguistic spectrum. Trained to accurately identify Spanish, Catalan, Galician, Euskera (Basque), and Occitan, this model enhances Whisper’s multilingual capabilities for regional language identification tasks.

The pre-trained base used for fine-tuning was: openai/whisper-medium.

Quickstart

from transformers import pipeline
import torch, librosa

classifier = pipeline(
    "audio-classification",
    model="Ugiat/IberLang",
    device=0 if torch.cuda.is_available() else -1
)

audio_path = "sample.wav"

audio, _ = librosa.load(audio_path, sr=16000)

prediction = classifier(audio)

print(prediction[0]["label"])

Performance Evaluation

We evaluated the fine-tuned IberLang classifier against Whisper Large V3 using a reserved subset of our custom VoxLingua107 IberLang dataset containing 1200 audios. The results show substantial performance gains, particularly in the recognition of minority Iberian languages.

Model Catalan Basque Galician Occitan Spanish
IberLang 0.902 0.96 0.915 0.655 1.0
Whisper-Large-V3 0.902 0.68 0.188 0.0 0.978

Fine-Tuning Process

The fine-tuning process followed a structured approach, including dataset preparation, model training, and optimization:

  • Data Splitting: The dataset was shuffled and split into training (90%) and testing (10%) subsets.
  • Training Setup:
    • Batch size: 4
    • Gradient accumulation steps: 8
    • Epoch: 3
    • Learning rate: 1e-5
    • Scheduler: Linear
    • Evaluation frequency: Every 300 steps
    • Checkpointing: Every 300 steps

License

This model, IberLang, is a fine-tuned version of Whisper Medium by OpenAI, licensed under the Apache License 2.0.

Fine-tuning and additional modifications were performed by Ugiat Technologies to improve multilingual language identification for Catalan, Galician, Basque, Spanish, and Occitan.

The resulting model and associated documentation are released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

When using this model, please cite both the original Whisper project and this fine-tuned version as appropriate.

Downloads last month
20
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ugiat/IberLang

Finetuned
(758)
this model

Dataset used to train Ugiat/IberLang