facebook
/

hf-seamless-m4t-large

@@ -27,16 +27,15 @@ This is the "large" variant of the unified model, which enables multiple tasks w
 You can perform all the above tasks from one single model - `SeamlessM4TModel`, but each task also has its own dedicated sub-model.
-## Usage
 First, load the processor and a checkpoint of the model:
 ```python
->>> from transformers import AutoProcessor, SeamlessM4TModel
->>> processor = AutoProcessor.from_pretrained("ylacombe/hf-seamless-m4t-medium")
->>> model = SeamlessM4TModel.from_pretrained("ylacombe/hf-seamless-m4t-medium")
 ```
 You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
@@ -46,25 +45,43 @@ You can seamlessly use this model on text or on audio, to generated either trans
 You can easily generate translated speech with [`SeamlessM4TModel.generate`]. Here is an example showing how to generate speech from English to Russian.
 ```python
->>> inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
->>> audio_array = model.generate(**inputs, tgt_lang="rus")
->>> audio_array = audio_array[0].cpu().numpy().squeeze()
 ```
 You can also translate directly from a speech waveform. Here is an example from Arabic to English:
 ```python
->>> from datasets import load_dataset
->>> dataset = load_dataset("arabic_speech_corpus", split="test[0:1]")
->>> audio_sample = dataset["audio"][0]["array"]
->>> inputs = processor(audios = audio_sample, return_tensors="pt")
->>> audio_array = model.generate(**inputs, tgt_lang="rus")
->>> audio_array = audio_array[0].cpu().numpy().squeeze()
 ```
 #### Tips
@@ -73,8 +90,8 @@ You can also translate directly from a speech waveform. Here is an example from
 For example, you can replace the previous snippet with the model dedicated to the S2ST task:
 ```python
->>> from transformers import SeamlessM4TForSpeechToSpeech
->>> model = SeamlessM4TForSpeechToSpeech.from_pretrained("ylacombe/hf-seamless-m4t-medium")
 ```
@@ -83,25 +100,25 @@ For example, you can replace the previous snippet with the model dedicated to th
 Similarly, you can generate translated text from text or audio files, this time using the dedicated models.
 ```python
->>> from transformers import SeamlessM4TForSpeechToText
->>> model = SeamlessM4TForSpeechToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
->>> audio_sample = dataset["audio"][0]["array"]
->>> inputs = processor(audios = audio_sample, return_tensors="pt")
->>> output_tokens = model.generate(**inputs, tgt_lang="fra")
->>> translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
 ```
 And from text:
 ```python
->>> from transformers import SeamlessM4TForTextToText
->>> model = SeamlessM4TForTextToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
->>> inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
->>> output_tokens = model.generate(**inputs, tgt_lang="fra")
->>> translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
 ```
 #### Tips

 You can perform all the above tasks from one single model - `SeamlessM4TModel`, but each task also has its own dedicated sub-model.
+## 🤗 Usage
 First, load the processor and a checkpoint of the model:
 ```python
+from transformers import AutoProcessor, SeamlessM4TModel
+processor = AutoProcessor.from_pretrained("ylacombe/hf-seamless-m4t-medium")
+model = SeamlessM4TModel.from_pretrained("ylacombe/hf-seamless-m4t-medium")
 ```
 You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
 You can easily generate translated speech with [`SeamlessM4TModel.generate`]. Here is an example showing how to generate speech from English to Russian.
 ```python
+inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
+audio_array = model.generate(**inputs, tgt_lang="rus")
+audio_array = audio_array[0].cpu().numpy().squeeze()
 ```
 You can also translate directly from a speech waveform. Here is an example from Arabic to English:
 ```python
+from datasets import load_dataset
+dataset = load_dataset("arabic_speech_corpus", split="test[0:1]")
+audio_sample = dataset["audio"][0]["array"]
+inputs = processor(audios = audio_sample, return_tensors="pt")
+audio_array = model.generate(**inputs, tgt_lang="rus")
+audio_array = audio_array[0].cpu().numpy().squeeze()
+```
+Listen to the speech samples either in an ipynb notebook:
+```python
+from IPython.display import Audio
+sampling_rate = model.config.sample_rate
+Audio(audio_array, rate=sampling_rate)
+```
+Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
+```python
+import scipy
+sampling_rate = model.config.sample_rate
+scipy.io.wavfile.write("seamless_m4t_out.wav", rate=sampling_rate, data=audio_array)
 ```
 #### Tips
 For example, you can replace the previous snippet with the model dedicated to the S2ST task:
 ```python
+from transformers import SeamlessM4TForSpeechToSpeech
+model = SeamlessM4TForSpeechToSpeech.from_pretrained("ylacombe/hf-seamless-m4t-medium")
 ```
 Similarly, you can generate translated text from text or audio files, this time using the dedicated models.
 ```python
+from transformers import SeamlessM4TForSpeechToText
+model = SeamlessM4TForSpeechToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
+audio_sample = dataset["audio"][0]["array"]
+inputs = processor(audios = audio_sample, return_tensors="pt")
+output_tokens = model.generate(**inputs, tgt_lang="fra")
+translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
 ```
 And from text:
 ```python
+from transformers import SeamlessM4TForTextToText
+model = SeamlessM4TForTextToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
+inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
+output_tokens = model.generate(**inputs, tgt_lang="fra")
+translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
 ```
 #### Tips