Update README.md
Browse files
README.md
CHANGED
|
@@ -13,9 +13,12 @@ library_name: espnet
|
|
| 13 |
pipeline_tag: automatic-speech-recognition
|
| 14 |
---
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
|
| 21 |
The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
|
|
@@ -24,9 +27,9 @@ The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
|
|
| 24 |
Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
|
| 25 |
When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
|
| 26 |
|
| 27 |
-
This repo contains a
|
| 28 |
It is trained on 320k hours of public speech data.
|
| 29 |
-
The newly curated data
|
| 30 |
|
| 31 |
It supports the following speech-to-text tasks:
|
| 32 |
- Language identification
|
|
|
|
| 13 |
pipeline_tag: automatic-speech-recognition
|
| 14 |
---
|
| 15 |
|
| 16 |
+
🏆 **News:** Our [OWSM v4 paper](https://www.isca-archive.org/interspeech_2025/peng25c_interspeech.html) won the [Best Student Paper Award](https://isca-speech.org/ISCA-Awards) at INTERSPEECH 2025!
|
| 17 |
|
| 18 |
+
|
| 19 |
+
[Open Whisper-style Speech Model (OWSM)](https://www.wavlab.org/activities/2024/owsm/) is the first **fully open** Whisper-style speech foundation model.
|
| 20 |
+
It reproduces and advances OpenAI's Whisper-style training using publicly available data and open-source toolkits.
|
| 21 |
+
The code, pre-trained model weights, and training logs are publicly released to promote open science in speech foundation models.
|
| 22 |
|
| 23 |
Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
|
| 24 |
The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
|
|
|
|
| 27 |
Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
|
| 28 |
When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
|
| 29 |
|
| 30 |
+
This repo contains a medium-sized model with 1B parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
|
| 31 |
It is trained on 320k hours of public speech data.
|
| 32 |
+
The newly curated data are publicly released: https://huggingface.co/datasets/espnet/yodas_owsmv4
|
| 33 |
|
| 34 |
It supports the following speech-to-text tasks:
|
| 35 |
- Language identification
|