update readme: example script
Browse files
README.md
CHANGED
|
@@ -36,7 +36,7 @@ espnet_model_zoo
|
|
| 36 |
**The recipe can be found in ESPnet:** https://github.com/espnet/espnet/tree/master/egs2/powsm/s2t1
|
| 37 |
|
| 38 |
|
| 39 |
-
### Example script
|
| 40 |
|
| 41 |
Our models are trained on 16kHz audio with a fixed duration of 20s. When using the pre-trained model, please ensure the input speech is 16kHz and pad or truncate it to 20s.
|
| 42 |
|
|
@@ -50,9 +50,8 @@ task = '<pr>'
|
|
| 50 |
s2t = Speech2Text.from_pretrained(
|
| 51 |
"espnet/powsm",
|
| 52 |
device="cuda",
|
| 53 |
-
generate_interctc_outputs=False,
|
| 54 |
lang_sym='<eng>', # ISO 639-3; set to <unk> for unseen languages
|
| 55 |
-
task_sym=
|
| 56 |
)
|
| 57 |
|
| 58 |
speech, rate = sf.read("sample.wav", sr=16000)
|
|
@@ -63,8 +62,28 @@ if task == '<pr>' or task == '<g2p>:
|
|
| 63 |
print(pred)
|
| 64 |
```
|
| 65 |
|
|
|
|
|
|
|
| 66 |
See `force_align.py` in [ESPnet recipe](https://github.com/espnet/espnet/tree/master/egs2/powsm/s2t1) to try out CTC forced alignment with POWSM's encoder!
|
| 67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
### Citations
|
| 70 |
|
|
|
|
| 36 |
**The recipe can be found in ESPnet:** https://github.com/espnet/espnet/tree/master/egs2/powsm/s2t1
|
| 37 |
|
| 38 |
|
| 39 |
+
### Example script for PR/ASR/G2P/P2G
|
| 40 |
|
| 41 |
Our models are trained on 16kHz audio with a fixed duration of 20s. When using the pre-trained model, please ensure the input speech is 16kHz and pad or truncate it to 20s.
|
| 42 |
|
|
|
|
| 50 |
s2t = Speech2Text.from_pretrained(
|
| 51 |
"espnet/powsm",
|
| 52 |
device="cuda",
|
|
|
|
| 53 |
lang_sym='<eng>', # ISO 639-3; set to <unk> for unseen languages
|
| 54 |
+
task_sym=task, # <pr>, <asr>, <g2p>, <p2g>
|
| 55 |
)
|
| 56 |
|
| 57 |
speech, rate = sf.read("sample.wav", sr=16000)
|
|
|
|
| 62 |
print(pred)
|
| 63 |
```
|
| 64 |
|
| 65 |
+
#### Other tasks
|
| 66 |
+
|
| 67 |
See `force_align.py` in [ESPnet recipe](https://github.com/espnet/espnet/tree/master/egs2/powsm/s2t1) to try out CTC forced alignment with POWSM's encoder!
|
| 68 |
|
| 69 |
+
LID is learned implicitly during training, and you may run it with the script below:
|
| 70 |
+
|
| 71 |
+
```python
|
| 72 |
+
from espnet2.bin.s2t_inference_language import Speech2Language
|
| 73 |
+
import soundfile as sf # or librosa
|
| 74 |
+
|
| 75 |
+
s2t = Speech2Language.from_pretrained(
|
| 76 |
+
"espnet/powsm",
|
| 77 |
+
device="cuda",
|
| 78 |
+
nbest=1, # number of possible languages to return
|
| 79 |
+
first_lang_sym="<afr>", # fixed; defined in vocab list
|
| 80 |
+
last_lang_sym="<zul>" # fixed; defined in vocab list
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
+
speech, rate = sf.read("sample.wav", sr=16000)
|
| 84 |
+
pred = model(speech)[0] # a list of lang-prob pair
|
| 85 |
+
print(pred)
|
| 86 |
+
```
|
| 87 |
|
| 88 |
### Citations
|
| 89 |
|