Whisper long-form recognition not used

#45
by albertzeyer - opened

Are you checking the GitHub issues? I already reported that here: https://github.com/huggingface/open_asr_leaderboard/issues/100

Specifically, the Whisper long-form recognition is not used in the code. I.e. for audio longer than 30 secs, it would just ignore the remaining audio. For most of the benchmark datasets, most of the sequences are shorter than 30 secs, so that's why you don't notice this problem too much.

How are the numbers exactly produced? Where can I see the corresponding code?

Sign up or log in to comment