File size: 2,410 Bytes
fddb622 5fa234f fddb622 5fa234f fddb622 0eb5c0f 5fa234f fddb622 0eb5c0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
license: apache-2.0
language:
- en
- multilingual
tags:
- whisper
- speech-to-text
- pruna
- quantized
- 8bit
- optimized
- unsloth
library_name: transformers
base_model:
- unsloth/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
---
# Unsloth Whisper Large V3 Turbo - Pruna 8bit Optimized
This model is a Pruna-optimized version of `openai/whisper-large-v3-turbo` with 8-bit quantization optimizations.
## Optimizations Applied
## Optimizations Applied
- **Batcher Optimization**: int8 enabled (`whisper_s2t_int8: True`)
- **Compiler**: `c_whisper`
- **Batcher**: `whisper_s2t`
## Usage
### Option 1: Standard Transformers (Recommended for most users)
```python
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
# Simple loading - no Pruna installation required
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
# Use normally
result = model.generate(inputs, ...)
```
### Option 2: With Pruna Optimization (Maximum Performance)
```python
from pruna import smash, SmashConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoTokenizer, AutoProcessor
import json
# Load model and tokenizer
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
tokenizer = AutoTokenizer.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
# Load SmashConfig
with open("smash_config.json", "r") as f:
config_dict = json.load(f)
# Recreate SmashConfig
smash_config = SmashConfig()
for key, value in config_dict.items():
smash_config[key] = value
# Apply Pruna optimizations
smashed_model = smash(
model=model,
smash_config=smash_config
)
# Use the optimized model
result = smashed_model.inference(audio_input)
```
## Performance Benefits
- Reduced memory usage from 8-bit weight quantization
- Optimized inference pipeline with int8 batcher
- Maintained audio transcription quality
## Base Model
This model is based on `unsloth/whisper-large-v3-turbo`, which itself is optimized from `openai/whisper-large-v3-turbo`. It retains all the capabilities of both base models while providing additional Pruna performance improvements. |