update README with training details
Browse files- README.md +28 -4
- validation_wer.png +0 -0
README.md
CHANGED
|
@@ -25,13 +25,16 @@ model-index:
|
|
| 25 |
- name: Test WER
|
| 26 |
type: wer
|
| 27 |
value: 26.55
|
|
|
|
|
|
|
|
|
|
| 28 |
---
|
| 29 |
|
| 30 |
# Wav2Vec2-Large-XLSR-53-Arabic
|
| 31 |
|
| 32 |
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
| 33 |
-
on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice)
|
| 34 |
-
and
|
| 35 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
| 36 |
|
| 37 |
## Usage
|
|
@@ -174,5 +177,26 @@ print(f"WER: {metrics['wer']:.2%}")
|
|
| 174 |
|
| 175 |
## Training
|
| 176 |
|
| 177 |
-
|
| 178 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
- name: Test WER
|
| 26 |
type: wer
|
| 27 |
value: 26.55
|
| 28 |
+
- name: Validation WER
|
| 29 |
+
type: wer
|
| 30 |
+
value: 23.39
|
| 31 |
---
|
| 32 |
|
| 33 |
# Wav2Vec2-Large-XLSR-53-Arabic
|
| 34 |
|
| 35 |
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
| 36 |
+
on Arabic using the `train` splits of [Common Voice](https://huggingface.co/datasets/common_voice)
|
| 37 |
+
and [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus).
|
| 38 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
| 39 |
|
| 40 |
## Usage
|
|
|
|
| 177 |
|
| 178 |
## Training
|
| 179 |
|
| 180 |
+
For more details, see [Fine-Tuning with Arabic Speech Corpus](https://github.com/huggingface/transformers/tree/1c06240e1b3477728129bb58e7b6c7734bb5074e/examples/research_projects/wav2vec2#fine-tuning-with-arabic-speech-corpus).
|
| 181 |
+
|
| 182 |
+
This model represents Arabic in a format called [Buckwalter transliteration](https://en.wikipedia.org/wiki/Buckwalter_transliteration).
|
| 183 |
+
The Buckwalter format only includes ASCII characters, some of which are non-alpha (e.g., `">"` maps to `"أ"`).
|
| 184 |
+
The [lang-trans](https://github.com/kariminf/lang-trans) package is used to convert (transliterate) Arabic abjad.
|
| 185 |
+
|
| 186 |
+
[This script](https://github.com/huggingface/transformers/blob/1c06240e1b3477728129bb58e7b6c7734bb5074e/examples/research_projects/wav2vec2/finetune_large_xlsr_53_arabic_speech_corpus.sh)
|
| 187 |
+
was used to first fine-tune [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
| 188 |
+
on the `train` split of the [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus) dataset;
|
| 189 |
+
the `validation` split was used for model selection; the resulting model at this point is saved as [elgeish/wav2vec2-large-xlsr-53-levantine-arabic](https://huggingface.co/elgeish/wav2vec2-large-xlsr-53-levantine-arabic).
|
| 190 |
+
|
| 191 |
+
Training was then resumed using the `train` split of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset;
|
| 192 |
+
similarly, the `validation` split was used for model selection;
|
| 193 |
+
training was stopped to meet the deadline of [Fine-Tune-XLSR Week](https://github.com/huggingface/transformers/blob/700229f8a4003c4f71f29275e0874b5ba58cd39d/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md):
|
| 194 |
+
this model is the checkpoint at 100k steps and a validation WER of **23.39%**.
|
| 195 |
+
|
| 196 |
+
<img src="validation_wer.png" alt="Validation WER" width="50%" />
|
| 197 |
+
|
| 198 |
+
It's worth noting that validation WER is trending down, indicating the potential of further training (resuming the decaying learning rate at 7e-6).
|
| 199 |
+
|
| 200 |
+
## Future Work
|
| 201 |
+
One area to explore is using `attention_mask` in model input, which is recommended [here](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2).
|
| 202 |
+
Also, exploring data augmentation using datasets used to train models listed [here](https://paperswithcode.com/sota/speech-recognition-on-common-voice-arabic).
|
validation_wer.png
ADDED
|