yueyulin
/

rwkv_asr

@@ -21,11 +21,11 @@ The architect looks like:
 # Usage
 Inference sample code is:
 https://github.com/yynil/RWKVTTS/blob/respark/model/test/test_asr_whisper.py
-1. Download whisper_large_v3, although we only need encoder part, it's still easy to load from the model directory. Supposely we store it to /home/yueyulin/models/whisper-large-v3/
-2. Download the weights in this repo. Please note: 10k steps checkpoint training costs around 5k hours which is a very small amount of data and we are continuing training. Also it proves this mode needs less data to achieve a usable stage.
-3. Run the script like:
 ```bash
-python model/test/test_asr_whisper.py --audio_lm_path /home/yueyulin/models/rwkv7_0.1b_audio_lm_latents/ --llm_path /home/yueyulin/models/rwkv7-0.4B-g1a/ --ckpt_path /home/yueyulin/rwkvasr_whisper_10k.model.bin --audio_path new.mp3
 ```
 The output looks like:
@@ -33,7 +33,7 @@ The output looks like:
 or in English mode
 ```bash
-python model/test/test_asr_whisper.py --audio_lm_path /home/yueyulin/models/rwkv7_0.1b_audio_lm_latents/ --llm_path /home/yueyulin/models/rwkv7-0.4B-g1a/ --ckpt_path /home/yueyulin/rwkvasr_whisper_10k.model.bin --audio_path eng2.wav --language english
 ```
 The output looks like:

 # Usage
 Inference sample code is:
 https://github.com/yynil/RWKVTTS/blob/respark/model/test/test_asr_whisper.py
+1. Download the weights in this repo. Please note: 10k steps checkpoint training costs around 5k hours which is a very small amount of data and we are continuing training. Also it proves this mode needs less data to achieve a usable stage.
+2. Download the configuration directories in this repo. Assume you store them to directory YOUR_DIR.
+2. Run the script like:
 ```bash
+python model/test/test_asr_whisper.py --whisper_path $YOUR_DIR/whisper-large-v3/ --audio_lm_path $YOUR_DIR/rwkv7_0.1b_audio_lm_latents/ --llm_path $YOUR_DIR/rwkv7-0.4B-g1a/ --ckpt_path $YOUR_DIR/rwkvasr_whisper_10k.model.bin --audio_path new.mp3
 ```
 The output looks like:
 or in English mode
 ```bash
+python model/test/test_asr_whisper.py --whisper_path $YOUR_DIR/whisper-large-v3/ --audio_lm_path $YOUR_DIR/rwkv7_0.1b_audio_lm_latents/ --llm_path /home/yueyulin/models/rwkv7-0.4B-g1a/ --ckpt_path $YOUR_DIR/rwkvasr_whisper_10k.model.bin --audio_path eng2.wav --language english
 ```
 The output looks like: