sumedh
/

wav2vec2-large-xlsr-marathi

Automatic Speech Recognition

xlsr-fine-tuning-week

Model card Files Files and versions

sumedh commited on Mar 29, 2021

Commit

c3645dc

·

1 Parent(s): a34a2d0

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ model-index:
 # Wav2Vec2-Large-XLSR-53-Marathi
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Marathi using the [OpenSLR SLR64](http://openslr.org/64/) dataset. When using this model, make sure that your speech input is sampled at 16kHz. This data contains only female voices but it works well for male voices too.
-**WER on the Test Set**: 12.70 %
 ## Usage
 The model can be used directly without a language model as follows, given that your dataset has Marathi `actual_text` and `path_in_folder` columns:
 ```python
@@ -68,7 +68,7 @@ processor = Wav2Vec2Processor.from_pretrained("sumedh/wav2vec2-large-xlsr-marath
 model = Wav2Vec2ForCTC.from_pretrained("sumedh/wav2vec2-large-xlsr-marathi")
 model.to("cuda")
-chars_to_ignore_regex = '[\\\\\\\\,\\\\\\\\?\\\\\\\\.\\\\\\\\!\\\\\\\\-\\\\\\\\;\\\\\\\\:\\\\\\\\"\\\\\\\\“]'
 resampler = torchaudio.transforms.Resample(48_000, 16_000)
 # Preprocessing the datasets. We need to read the aduio files as arrays
 def speech_file_to_array_fn(batch):
@@ -90,7 +90,7 @@ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"],
 ## Training
 Train-Test ratio was 90:10.
-The colab notebook used for training can be found [here](https://colab.research.google.com/drive/1wX46fjExcgU5t3AsWhSPTipWg_aMDg2f?usp=sharing).
 ## Training Config and Summary
-weights-and-biases run profile [here](https://wandb.ai/wandb/xlsr/runs/3itdhtb8/overview?workspace=user-sumedhkhodke)

 # Wav2Vec2-Large-XLSR-53-Marathi
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Marathi using the [OpenSLR SLR64](http://openslr.org/64/) dataset. When using this model, make sure that your speech input is sampled at 16kHz. This data contains only female voices but it works well for male voices too.
+**WER (Word Error Rate) on the Test Set**: 12.70 %
 ## Usage
 The model can be used directly without a language model as follows, given that your dataset has Marathi `actual_text` and `path_in_folder` columns:
 ```python
 model = Wav2Vec2ForCTC.from_pretrained("sumedh/wav2vec2-large-xlsr-marathi")
 model.to("cuda")
+chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]'
 resampler = torchaudio.transforms.Resample(48_000, 16_000)
 # Preprocessing the datasets. We need to read the aduio files as arrays
 def speech_file_to_array_fn(batch):
 ## Training
 Train-Test ratio was 90:10.
+Colab training notebook can be found [here](https://colab.research.google.com/drive/1wX46fjExcgU5t3AsWhSPTipWg_aMDg2f?usp=sharing).
 ## Training Config and Summary
+weights-and-biases run summary [here](https://wandb.ai/wandb/xlsr/runs/3itdhtb8/overview?workspace=user-sumedhkhodke)