update readme and add plots
Browse filesSigned-off-by: monica-sekoyan <[email protected]>
- .gitattributes +1 -0
- README.md +9 -9
- plots/asr.png +3 -0
- plots/en_x.png +3 -0
- plots/x_en.png +3 -0
    	
        .gitattributes
    CHANGED
    
    | @@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text | |
| 34 | 
             
            *.zst filter=lfs diff=lfs merge=lfs -text
         | 
| 35 | 
             
            *tfevents* filter=lfs diff=lfs merge=lfs -text
         | 
| 36 | 
             
            canary-1b-v2.nemo filter=lfs diff=lfs merge=lfs -text
         | 
|  | 
|  | |
| 34 | 
             
            *.zst filter=lfs diff=lfs merge=lfs -text
         | 
| 35 | 
             
            *tfevents* filter=lfs diff=lfs merge=lfs -text
         | 
| 36 | 
             
            canary-1b-v2.nemo filter=lfs diff=lfs merge=lfs -text
         | 
| 37 | 
            +
            *.png filter=lfs diff=lfs merge=lfs -text
         | 
    	
        README.md
    CHANGED
    
    | @@ -42,7 +42,7 @@ We will soon release a comprehensive **Canary-1b-v2 technical report** detailing | |
| 42 |  | 
| 43 | 
             
            ### Automatic Speech Recognition (ASR)
         | 
| 44 |  | 
| 45 | 
            -
             using MUSAN music and noise samples \[16] on the [LibriSpeech Clean test set](https://www.openslr.org/12).
         | 
| 327 | 
             
            **Metric**: Word Error Rate (**WER**)
         | 
| 328 |  | 
| 329 | 
            -
            | **SNR (dB)** | 
| 330 | 
            -
            | --------------- | 
| 331 | 
            -
            | **`Canary-1b-v2`** | 2. | 
| 332 |  | 
| 333 |  | 
| 334 | 
             
            ### Hallucination Robustness
         | 
| @@ -346,8 +346,8 @@ Number of characters per minute on [MUSAN](https://www.openslr.org/17) \[16] 48 | |
| 346 |  | 
| 347 | 
             
            | **Dataset**             | **WER ↓** |
         | 
| 348 | 
             
            | ----------------------- | --------- |
         | 
| 349 | 
            -
            | Earnings-22             |  | 
| 350 | 
            -
            | This American Life      |  | 
| 351 |  | 
| 352 | 
             
            **Note:** Presented WERs do not include Punctuation and Capitalization errors.
         | 
| 353 |  | 
|  | |
| 42 |  | 
| 43 | 
             
            ### Automatic Speech Recognition (ASR)
         | 
| 44 |  | 
| 45 | 
            +
            
         | 
| 46 |  | 
| 47 | 
             
            *Figure 1: ASR WER comparison across different models. This does not include Punctuation and Capitalisation errors.*
         | 
| 48 |  | 
|  | |
| 52 |  | 
| 53 | 
             
            #### X → English
         | 
| 54 |  | 
| 55 | 
            +
            
         | 
| 56 |  | 
| 57 | 
             
            *Figure 2: AST X → En COMET scores comparison across different models*
         | 
| 58 |  | 
| 59 | 
             
            #### English → X
         | 
| 60 |  | 
| 61 |  | 
| 62 | 
            +
            
         | 
| 63 |  | 
| 64 | 
             
            *Figure 3: AST En → X COMET scores comparison across different models*
         | 
| 65 |  | 
|  | |
| 283 |  | 
| 284 | 
             
            | **WER ↓**          | Fleurs-25 Langs       | CoVoST-13 Langs      | MLS - 6 Langs      |
         | 
| 285 | 
             
            | ---------------    | --------------------  | -------------------- | ------------------ |
         | 
| 286 | 
            +
            | **`Canary-1b-v2`** | 8.40%                 | 8.85%                | 7.27%              |
         | 
| 287 |  | 
| 288 |  | 
| 289 | 
             
            **Note:** Presented WERs do not include Punctuation and Capitalization errors.
         | 
|  | |
| 326 | 
             
            Performance across different Signal-to-Noise Ratios (SNR) using MUSAN music and noise samples \[16] on the [LibriSpeech Clean test set](https://www.openslr.org/12).
         | 
| 327 | 
             
            **Metric**: Word Error Rate (**WER**)
         | 
| 328 |  | 
| 329 | 
            +
            | **SNR (dB)**       | 100      | 10    | 5     | 0     | -5     |
         | 
| 330 | 
            +
            | ---------------    | -----    | ----- | ----- | ----- | -----  |
         | 
| 331 | 
            +
            | **`Canary-1b-v2`** | 2.18% | 2.29% | 2.80% | 5.08% | 19.38% |
         | 
| 332 |  | 
| 333 |  | 
| 334 | 
             
            ### Hallucination Robustness
         | 
|  | |
| 346 |  | 
| 347 | 
             
            | **Dataset**             | **WER ↓** |
         | 
| 348 | 
             
            | ----------------------- | --------- |
         | 
| 349 | 
            +
            | Earnings-22             | 13.51%    |
         | 
| 350 | 
            +
            | This American Life      | 8.65%     |
         | 
| 351 |  | 
| 352 | 
             
            **Note:** Presented WERs do not include Punctuation and Capitalization errors.
         | 
| 353 |  | 
    	
        plots/asr.png
    ADDED
    
    |   | 
| Git LFS Details
 | 
    	
        plots/en_x.png
    ADDED
    
    |   | 
| Git LFS Details
 | 
    	
        plots/x_en.png
    ADDED
    
    |   | 
| Git LFS Details
 | 
