Model Card for Whisper Small Turkish

This model is a fine-tuned version of openai/whisper-small on the Mozilla Common Voice 23.0 Turkish dataset.

Key Features & Robustness

Standard ASR models often fail in noisy environments. This model tackles that problem by applying JIT (Just-In-Time) Augmentation during training.

The model was exposed to the following synthetic degradations dynamically during the training loop:

Gaussian Noise Injection: Simulating background static and environmental noise.
Time Stretching: Randomly speeding up or slowing down speech (0.8x - 1.2x) to handle fast/slow speakers.
Frequency Masking: Simulating codec loss or bad microphone quality.

Result: The model demonstrates high resilience to noise, maintaining transcription accuracy even when the input audio has a low Signal-to-Noise Ratio (SNR).

Performance

Metric	Condition	Performance
WER (Word Error Rate)	Clean Audio	~20%
WER (Word Error Rate)	Noisy/Distorted Audio	~20% (Robust)

WandB

WandB report

Usage

You can use this model directly with the Hugging Face pipeline.

import torch
from transformers import pipeline

# 1. Load the pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
    "automatic-speech-recognition", 
    model="ogulcanakca/whisper-small-tr",
    device=device,
    generate_kwargs={
        "length_penalty": 1.5,  
        "no_repeat_ngram_size": 2, 
        "language": "turkish",   
        "task": "transcribe",   
        "compression_ratio_threshold": 1.35
        }
)

# 2. Transcribe audio (can be a file path or URL)
# The model handles resampling automatically.
result = pipe("path_to_your_audio.mp3")

print(result["text"])

Parameter Details

per_device_train_batch_size=64
gradient_accumulation_steps=1
gradient_checkpointing=False
fp16=True
dataloader_num_workers=8
dataloader_pin_memory=True
learning_rate=1e-5
num_train_epochs=5
per_device_eval_batch_size=32
predict_with_generate=True
generation_max_length=225
save_steps=1000
eval_steps=1000
warmup_steps=500
logging_steps=10

The training lasted approximately 67 minutes on the A100 GPU (80 gb).

Downloads last month: 38

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for ogulcanakca/whisper-small-tr

Base model

openai/whisper-small

Finetuned

(3102)

this model

Quantizations

1 model

Evaluation results

Wer on Common Voice 23.0 (Turkish)
self-reported

20.000