RWKV ASR is to add audio modality to RWKV7 model which means RWKV7 base model stays unaltered. The model trained a 0.1B rwkv model to convert whisper-large-v3 encoder's latents to RWKV7's latents space which convert the speech into texts according to the text instruction. This design keeps all abilities of LLM and is easy to add more functions to the model such as speech to speech, speech translation, etc. You name it!

The architect looks like:

image/png

Usage

Inference sample code is: https://github.com/yynil/RWKVTTS/blob/respark/model/test/test_asr_whisper.py

  1. Download the weights in this repo. Please note: 10k steps checkpoint training costs around 5k hours which is a very small amount of data and we are continuing training. Also it proves this mode needs less data to achieve a usable stage.
  2. Download the configuration directories in this repo. Assume you store them to directory YOUR_DIR.
  3. Run the script like:
python model/test/test_asr_whisper.py --whisper_path $YOUR_DIR/whisper-large-v3/ --audio_lm_path $YOUR_DIR/rwkv7_0.1b_audio_lm_latents/ --llm_path $YOUR_DIR/rwkv7-0.4B-g1a/ --ckpt_path $YOUR_DIR/rwkvasr_whisper_10k.model.bin --audio_path new.mp3

The output looks like:

image/png

or in English mode

python model/test/test_asr_whisper.py --whisper_path $YOUR_DIR/whisper-large-v3/ --audio_lm_path $YOUR_DIR/rwkv7_0.1b_audio_lm_latents/ --llm_path /home/yueyulin/models/rwkv7-0.4B-g1a/ --ckpt_path $YOUR_DIR/rwkvasr_whisper_10k.model.bin --audio_path eng2.wav --language english

The output looks like:

image/png

Another way to do inference downloading only the trained params stored in rwkv7_0.1b_audio_lm_latents_150k, you can download whisper_large_v3's weights: https://huggingface.co/openai/whisper-large-v3/blob/main/pytorch_model.bin and put it to the whisper directory. And then download the rwkv-0.4b-g1a's weights : https://huggingface.co/fla-hub/rwkv7-0.4B-g1a/blob/main/model.safetensors and put it to the rwkv-0.4b-g1a directory.

Run the script here: https://github.com/yynil/RWKVTTS/blob/respark/model/test/test_asr_whisper_load.py For an example:

python model/test/test_asr_whisper_load.py --whisper_path /home/yueyulin/models/whisper-large-v3/ --audio_lm_path /home/yueyulin/tmp/rwkvasr/rwkv7_0.1b_audio_lm_latents_150k/ --llm_path /home/yueyulin/models/rwkv7-0.4B-g1a/  --audio_path 918.wav

You will get the result like below:

image

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yueyulin/rwkv_asr

Base model

BlinkDL/rwkv7-g1
Finetuned
(1)
this model

Datasets used to train yueyulin/rwkv_asr