yueyulin
/

rwkv_asr

@@ -13,3 +13,10 @@ pipeline_tag: automatic-speech-recognition
 RWKV ASR is to add audio modality to RWKV7 model which means RWKV7 base model stays unaltered. The model trained a 0.1B rwkv model to convert whisper-large-v3 encoder's latents to RWKV7's latents space which convert the speech into texts according to the text instruction.
 This design keeps all abilities of LLM and is easy to add more functions to the model such as speech to speech, speech translation, etc. You name it!

 RWKV ASR is to add audio modality to RWKV7 model which means RWKV7 base model stays unaltered. The model trained a 0.1B rwkv model to convert whisper-large-v3 encoder's latents to RWKV7's latents space which convert the speech into texts according to the text instruction.
 This design keeps all abilities of LLM and is easy to add more functions to the model such as speech to speech, speech translation, etc. You name it!
+# Usage
+Inference sample code is:
+https://github.com/yynil/RWKVTTS/blob/respark/model/test/test_asr_whisper.py
+1. Download whisper_large_v3, although we only need encoder part, it's still easy to load from the model directory. Supposely we store it to /home/yueyulin/models/whisper-large-v3/
+2.