Model Card for Whisper Small Turkish

This model is a fine-tuned version of openai/whisper-small on the Mozilla Common Voice 23.0 Turkish dataset.

Key Features & Robustness

Standard ASR models often fail in noisy environments. This model tackles that problem by applying JIT (Just-In-Time) Augmentation during training.

The model was exposed to the following synthetic degradations dynamically during the training loop:

  • Gaussian Noise Injection: Simulating background static and environmental noise.
  • Time Stretching: Randomly speeding up or slowing down speech (0.8x - 1.2x) to handle fast/slow speakers.
  • Frequency Masking: Simulating codec loss or bad microphone quality.

Result: The model demonstrates high resilience to noise, maintaining transcription accuracy even when the input audio has a low Signal-to-Noise Ratio (SNR).

Performance

Metric Condition Performance
WER (Word Error Rate) Clean Audio ~20%
WER (Word Error Rate) Noisy/Distorted Audio ~20% (Robust)

WandB

WandB report

Usage

You can use this model directly with the Hugging Face pipeline.

import torch
from transformers import pipeline

# 1. Load the pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
    "automatic-speech-recognition", 
    model="ogulcanakca/whisper-small-tr",
    device=device,
    generate_kwargs={
        "length_penalty": 1.5,  
        "no_repeat_ngram_size": 2, 
        "language": "turkish",   
        "task": "transcribe",   
        "compression_ratio_threshold": 1.35
        }
)

# 2. Transcribe audio (can be a file path or URL)
# The model handles resampling automatically.
result = pipe("path_to_your_audio.mp3")

print(result["text"])

Parameter Details

  • per_device_train_batch_size=64
  • gradient_accumulation_steps=1
  • gradient_checkpointing=False
  • fp16=True
  • dataloader_num_workers=8
  • dataloader_pin_memory=True
  • learning_rate=1e-5
  • num_train_epochs=5
  • per_device_eval_batch_size=32
  • predict_with_generate=True
  • generation_max_length=225
  • save_steps=1000
  • eval_steps=1000
  • warmup_steps=500
  • logging_steps=10

The training lasted approximately 67 minutes on the A100 GPU (80 gb).

Downloads last month
38
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ogulcanakca/whisper-small-tr

Finetuned
(3102)
this model
Quantizations
1 model

Evaluation results