Update README.md
Browse files
README.md
CHANGED
|
@@ -13,3 +13,10 @@ pipeline_tag: automatic-speech-recognition
|
|
| 13 |
|
| 14 |
RWKV ASR is to add audio modality to RWKV7 model which means RWKV7 base model stays unaltered. The model trained a 0.1B rwkv model to convert whisper-large-v3 encoder's latents to RWKV7's latents space which convert the speech into texts according to the text instruction.
|
| 15 |
This design keeps all abilities of LLM and is easy to add more functions to the model such as speech to speech, speech translation, etc. You name it!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
RWKV ASR is to add audio modality to RWKV7 model which means RWKV7 base model stays unaltered. The model trained a 0.1B rwkv model to convert whisper-large-v3 encoder's latents to RWKV7's latents space which convert the speech into texts according to the text instruction.
|
| 15 |
This design keeps all abilities of LLM and is easy to add more functions to the model such as speech to speech, speech translation, etc. You name it!
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
# Usage
|
| 19 |
+
Inference sample code is:
|
| 20 |
+
https://github.com/yynil/RWKVTTS/blob/respark/model/test/test_asr_whisper.py
|
| 21 |
+
1. Download whisper_large_v3, although we only need encoder part, it's still easy to load from the model directory. Supposely we store it to /home/yueyulin/models/whisper-large-v3/
|
| 22 |
+
2.
|