File size: 2,410 Bytes
fddb622
 
5fa234f
 
 
fddb622
 
 
 
 
 
 
5fa234f
fddb622
0eb5c0f
5fa234f
 
fddb622
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0eb5c0f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: apache-2.0
language:
  - en
  - multilingual
tags:
- whisper
- speech-to-text
- pruna
- quantized
- 8bit
- optimized
- unsloth
library_name: transformers
base_model:
- unsloth/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
---

# Unsloth Whisper Large V3 Turbo - Pruna 8bit Optimized

This model is a Pruna-optimized version of `openai/whisper-large-v3-turbo` with 8-bit quantization optimizations.

## Optimizations Applied

## Optimizations Applied
- **Batcher Optimization**: int8 enabled (`whisper_s2t_int8: True`)
- **Compiler**: `c_whisper`
- **Batcher**: `whisper_s2t`

## Usage

### Option 1: Standard Transformers (Recommended for most users)

```python
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

# Simple loading - no Pruna installation required
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")

# Use normally
result = model.generate(inputs, ...)
```

### Option 2: With Pruna Optimization (Maximum Performance)

```python
from pruna import smash, SmashConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoTokenizer, AutoProcessor
import json

# Load model and tokenizer
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
tokenizer = AutoTokenizer.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")

# Load SmashConfig
with open("smash_config.json", "r") as f:
    config_dict = json.load(f)

# Recreate SmashConfig
smash_config = SmashConfig()
for key, value in config_dict.items():
    smash_config[key] = value

# Apply Pruna optimizations
smashed_model = smash(
    model=model,
    smash_config=smash_config
)

# Use the optimized model
result = smashed_model.inference(audio_input)
```

## Performance Benefits

- Reduced memory usage from 8-bit weight quantization
- Optimized inference pipeline with int8 batcher
- Maintained audio transcription quality

## Base Model

This model is based on `unsloth/whisper-large-v3-turbo`, which itself is optimized from `openai/whisper-large-v3-turbo`. It retains all the capabilities of both base models while providing additional Pruna performance improvements.