Audio Classification
Safetensors
wav2vec2-bert
5roop commited on
Commit
b18a8c6
·
verified ·
1 Parent(s): 5e75061

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -20
README.md CHANGED
@@ -34,6 +34,18 @@ presence of filled pauses ("eee", "errm", ...).
34
  - **Language(s) (NLP):** Trained and tested on Slovenian [ROG-Artur](http://hdl.handle.net/11356/1992), evaluated also on Croatian, Serbian, Polish, and Czech samples from the [ParlaSpeech corpus](http://clarinsi.github.io/parlaspeech)
35
  - **Finetuned from model:** facebook/w2v-bert-2.0
36
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## Paper
39
  ```bibtex
@@ -55,6 +67,8 @@ presence of filled pauses ("eee", "errm", ...).
55
  ```
56
 
57
 
 
 
58
  # Training data
59
 
60
  The model was trained on human-annotated Slovenian speech corpus
@@ -223,25 +237,5 @@ print(ds["intervals"][0])
223
  # [[0.08, 0.28 ], ...]
224
  ```
225
 
226
- ## Paper
227
-
228
- Please cite the following paper:
229
- ```bibtex
230
- @inproceedings{ljubesic-etal-2025-identifying,
231
- title = "Identifying Filled Pauses in Speech Across South and {W}est {S}lavic Languages",
232
- author = "Ljube{\v{s}}i{\'c}, Nikola and Porupski, Ivan and Rupnik, Peter",
233
- editor = "Piskorski, Jakub and P{\v{r}}ib{\'a}{\v{n}}, Pavel and Nakov, Preslav and Yangarber, Roman and Marcinczuk, Michal",
234
- booktitle = "Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)",
235
- month = jul,
236
- year = "2025",
237
- address = "Vienna, Austria",
238
- publisher = "Association for Computational Linguistics",
239
- url = "https://aclanthology.org/2025.bsnlp-1.1/",
240
- doi = "10.18653/v1/2025.bsnlp-1.1",
241
- pages = "1--8",
242
- ISBN = "978-1-959429-57-9",
243
- abstract = "Filled pauses are among the most common paralinguistic features of speech, yet they are mainly omitted from transcripts. We propose a transformer-based approach for detecting filled pauses directly from the speech signal, fine-tuned on Slovenian and evaluated across South and West Slavic languages. Our results show that speech transformers achieve excellent performance in detecting filled pauses when evaluated in the in-language scenario. We further evaluate cross-lingual capabilities of the model on two closely related South Slavic languages (Croatian and Serbian) and two less closely related West Slavic languages (Czech and Polish). Our results reveal strong cross-lingual generalization capabilities of the model, with only minor performance drops. Moreover, error analysis reveals that the model outperforms human annotators in recall and F1 score, while trailing slightly in precision. In addition to evaluating the capabilities of speech transformers for filled pause detection across Slavic languages, we release new multilingual test datasets and make our fine-tuned model publicly available to support further research and applications in spoken language processing."
244
- }
245
- ```
246
 
247
 
 
34
  - **Language(s) (NLP):** Trained and tested on Slovenian [ROG-Artur](http://hdl.handle.net/11356/1992), evaluated also on Croatian, Serbian, Polish, and Czech samples from the [ParlaSpeech corpus](http://clarinsi.github.io/parlaspeech)
35
  - **Finetuned from model:** facebook/w2v-bert-2.0
36
 
37
+ ## Model reference
38
+ If you wish to cite this model, use
39
+ ```bibtex
40
+ @misc{wav2vecbert2-filledPause,
41
+ author = { Rupnik, Peter and Ljubešić, Nikola and Porupski, Ivan and Verdonik, Darinka },
42
+ title = { wav2vecbert2-filledPause (Revision 5e75061) },
43
+ year = 2025,
44
+ url = { https://huggingface.co/classla/wav2vecbert2-filledPause },
45
+ doi = { 10.57967/hf/6732 },
46
+ publisher = { Hugging Face }
47
+ }
48
+ ```
49
 
50
  ## Paper
51
  ```bibtex
 
67
  ```
68
 
69
 
70
+
71
+
72
  # Training data
73
 
74
  The model was trained on human-annotated Slovenian speech corpus
 
237
  # [[0.08, 0.28 ], ...]
238
  ```
239
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
 
241