readme update
Browse files
    	
        README.md
    CHANGED
    
    | @@ -18,6 +18,7 @@ datasets: | |
| 18 | 
             
            - MLCommons/peoples_speech
         | 
| 19 | 
             
            thumbnail: null
         | 
| 20 | 
             
            tags:
         | 
|  | |
| 21 | 
             
            - automatic-speech-recognition
         | 
| 22 | 
             
            - speech
         | 
| 23 | 
             
            - audio
         | 
| @@ -182,6 +183,77 @@ img { | |
| 182 | 
             
            It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
         | 
| 183 | 
             
            See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
         | 
| 184 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 185 | 
             
            ## NVIDIA NeMo: Training
         | 
| 186 |  | 
| 187 | 
             
            To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
         | 
|  | |
| 18 | 
             
            - MLCommons/peoples_speech
         | 
| 19 | 
             
            thumbnail: null
         | 
| 20 | 
             
            tags:
         | 
| 21 | 
            +
            - transformers
         | 
| 22 | 
             
            - automatic-speech-recognition
         | 
| 23 | 
             
            - speech
         | 
| 24 | 
             
            - audio
         | 
|  | |
| 183 | 
             
            It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
         | 
| 184 | 
             
            See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
         | 
| 185 |  | 
| 186 | 
            +
            ## Transformers
         | 
| 187 | 
            +
             | 
| 188 | 
            +
            You can now run Parakeet CTC natively with [Transformers](https://github.com/huggingface/transformers) 🤗
         | 
| 189 | 
            +
             | 
| 190 | 
            +
            ```bash
         | 
| 191 | 
            +
            pip install git+https://github.com/huggingface/transformers
         | 
| 192 | 
            +
            ```
         | 
| 193 | 
            +
             | 
| 194 | 
            +
            <details>
         | 
| 195 | 
            +
              <summary>➡️ Pipeline usage</summary>
         | 
| 196 | 
            +
             | 
| 197 | 
            +
            ```python
         | 
| 198 | 
            +
            from transformers import pipeline
         | 
| 199 | 
            +
             | 
| 200 | 
            +
            pipe = pipeline("automatic-speech-recognition", model="nvidia/parakeet-ctc-1.1b")
         | 
| 201 | 
            +
            out = pipe("https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3")
         | 
| 202 | 
            +
            print(out)
         | 
| 203 | 
            +
            ```
         | 
| 204 | 
            +
            </details>
         | 
| 205 | 
            +
             | 
| 206 | 
            +
            <details>
         | 
| 207 | 
            +
              <summary>➡️ AutoModel</summary>
         | 
| 208 | 
            +
             | 
| 209 | 
            +
            ```python
         | 
| 210 | 
            +
            from transformers import AutoModelForCTC, AutoProcessor
         | 
| 211 | 
            +
            from datasets import load_dataset, Audio
         | 
| 212 | 
            +
            import torch
         | 
| 213 | 
            +
             | 
| 214 | 
            +
            device = "cuda" if torch.cuda.is_available() else "cpu"
         | 
| 215 | 
            +
             | 
| 216 | 
            +
            processor = AutoProcessor.from_pretrained("nvidia/parakeet-ctc-1.1b")
         | 
| 217 | 
            +
            model = AutoModelForCTC.from_pretrained("nvidia/parakeet-ctc-1.1b", dtype="auto", device_map=device)
         | 
| 218 | 
            +
             | 
| 219 | 
            +
            ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
         | 
| 220 | 
            +
            ds = ds.cast_column("audio", Audio(sampling_rate=processor.feature_extractor.sampling_rate))
         | 
| 221 | 
            +
            speech_samples = [el['array'] for el in ds["audio"][:5]]
         | 
| 222 | 
            +
             | 
| 223 | 
            +
            inputs = processor(speech_samples, sampling_rate=processor.feature_extractor.sampling_rate)
         | 
| 224 | 
            +
            inputs.to(model.device, dtype=model.dtype)
         | 
| 225 | 
            +
            outputs = model.generate(**inputs)
         | 
| 226 | 
            +
            print(processor.batch_decode(outputs))
         | 
| 227 | 
            +
            ```
         | 
| 228 | 
            +
            </details>
         | 
| 229 | 
            +
             | 
| 230 | 
            +
            <details>
         | 
| 231 | 
            +
              <summary>➡️ Training</summary>
         | 
| 232 | 
            +
             | 
| 233 | 
            +
            ```python
         | 
| 234 | 
            +
            from transformers import AutoModelForCTC, AutoProcessor
         | 
| 235 | 
            +
            from datasets import load_dataset, Audio
         | 
| 236 | 
            +
            import torch
         | 
| 237 | 
            +
             | 
| 238 | 
            +
            device = "cuda" if torch.cuda.is_available() else "cpu"
         | 
| 239 | 
            +
             | 
| 240 | 
            +
            processor = AutoProcessor.from_pretrained("nvidia/parakeet-ctc-1.1b")
         | 
| 241 | 
            +
            model = AutoModelForCTC.from_pretrained("nvidia/parakeet-ctc-1.1b", dtype="auto", device_map=device)
         | 
| 242 | 
            +
             | 
| 243 | 
            +
            ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
         | 
| 244 | 
            +
            ds = ds.cast_column("audio", Audio(sampling_rate=processor.feature_extractor.sampling_rate))
         | 
| 245 | 
            +
            speech_samples = [el['array'] for el in ds["audio"][:5]]
         | 
| 246 | 
            +
            text_samples = [el for el in ds["text"][:5]]
         | 
| 247 | 
            +
             | 
| 248 | 
            +
            # passing `text` to the processor will prepare inputs' `labels` key
         | 
| 249 | 
            +
            inputs = processor(audio=speech_samples, text=text_samples, sampling_rate=processor.feature_extractor.sampling_rate)
         | 
| 250 | 
            +
            inputs.to(device, dtype=model.dtype)
         | 
| 251 | 
            +
             | 
| 252 | 
            +
            outputs = model(**inputs)
         | 
| 253 | 
            +
            outputs.loss.backward()
         | 
| 254 | 
            +
            ```
         | 
| 255 | 
            +
            </details>
         | 
| 256 | 
            +
             | 
| 257 | 
             
            ## NVIDIA NeMo: Training
         | 
| 258 |  | 
| 259 | 
             
            To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
         | 

