openai
/

whisper-large

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions

ArthurZ HF Staff commited on Oct 4, 2022

Commit

63138d2

·

1 Parent(s): f63ed3d

Update README.md

Files changed (1) hide show

README.md +2 -4

README.md CHANGED Viewed

@@ -208,19 +208,17 @@ The "<|en|>" token is used to specify that the speech is in english and should b
 >>> processor = WhisperProcessor.from_pretrained("openai/whisper-large")
 >>> model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")
->>> decoder_input_ids = processor.tokenizer.encode("<|startoftranscript|><|en|><|transcribe|><|notimestamps|>", return_tensors="pt")
 >>> # load dummy dataset and read soundfiles
 >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
 >>> # tokenize
 >>> input_features = processor(ds[0]["audio"]["array"], return_tensors="pt").input_features
 >>> # retrieve logits
->>> logits = model(input_features, decoder_input_ids).logits
 >>> # take argmax and decode
 >>> predicted_ids = torch.argmax(logits, dim=-1)
 >>> transcription = processor.batch_decode(predicted_ids)
-['<|en|><|transcribe|><|notimestamps|> Mr']
 ```
 ### French to French

 >>> processor = WhisperProcessor.from_pretrained("openai/whisper-large")
 >>> model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")
 >>> # load dummy dataset and read soundfiles
 >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
 >>> # tokenize
 >>> input_features = processor(ds[0]["audio"]["array"], return_tensors="pt").input_features
 >>> # retrieve logits
+>>> logits = model(input_features).logits
 >>> # take argmax and decode
 >>> predicted_ids = torch.argmax(logits, dim=-1)
 >>> transcription = processor.batch_decode(predicted_ids)
+['<|startoftranscript|><|en|><|notimestamps|> Mr']
 ```
 ### French to French