File size: 2,410 Bytes

---
license: apache-2.0
language:
  - en
  - multilingual
tags:
- whisper
- speech-to-text
- pruna
- quantized
- 8bit
- optimized
- unsloth
library_name: transformers
base_model:
- unsloth/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
---

# Unsloth Whisper Large V3 Turbo - Pruna 8bit Optimized

This model is a Pruna-optimized version of `openai/whisper-large-v3-turbo` with 8-bit quantization optimizations.

## Optimizations Applied

## Optimizations Applied
- **Batcher Optimization**: int8 enabled (`whisper_s2t_int8: True`)
- **Compiler**: `c_whisper`
- **Batcher**: `whisper_s2t`

## Usage

### Option 1: Standard Transformers (Recommended for most users)

```python
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

# Simple loading - no Pruna installation required
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")

# Use normally
result = model.generate(inputs, ...)
```

### Option 2: With Pruna Optimization (Maximum Performance)

```python
from pruna import smash, SmashConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoTokenizer, AutoProcessor
import json

# Load model and tokenizer
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
tokenizer = AutoTokenizer.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")

# Load SmashConfig
with open("smash_config.json", "r") as f:
    config_dict = json.load(f)

# Recreate SmashConfig
smash_config = SmashConfig()
for key, value in config_dict.items():
    smash_config[key] = value

# Apply Pruna optimizations
smashed_model = smash(
    model=model,
    smash_config=smash_config
)

# Use the optimized model
result = smashed_model.inference(audio_input)
```

## Performance Benefits

- Reduced memory usage from 8-bit weight quantization
- Optimized inference pipeline with int8 batcher
- Maintained audio transcription quality

## Base Model

This model is based on `unsloth/whisper-large-v3-turbo`, which itself is optimized from `openai/whisper-large-v3-turbo`. It retains all the capabilities of both base models while providing additional Pruna performance improvements.