Audio Classification
Safetensors
wav2vec2-bert
5roop commited on
Commit
441988b
·
verified ·
1 Parent(s): 19b77f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -16
README.md CHANGED
@@ -18,6 +18,24 @@ metrics:
18
 
19
  # Frame classification for filled pauses
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Paper
22
  ```bibtex
23
  @inproceedings{ljubesic-etal-2025-identifying,
@@ -36,22 +54,6 @@ metrics:
36
  abstract = "Filled pauses are among the most common paralinguistic features of speech, yet they are mainly omitted from transcripts. We propose a transformer-based approach for detecting filled pauses directly from the speech signal, fine-tuned on Slovenian and evaluated across South and West Slavic languages. Our results show that speech transformers achieve excellent performance in detecting filled pauses when evaluated in the in-language scenario. We further evaluate cross-lingual capabilities of the model on two closely related South Slavic languages (Croatian and Serbian) and two less closely related West Slavic languages (Czech and Polish). Our results reveal strong cross-lingual generalization capabilities of the model, with only minor performance drops. Moreover, error analysis reveals that the model outperforms human annotators in recall and F1 score, while trailing slightly in precision. In addition to evaluating the capabilities of speech transformers for filled pause detection across Slavic languages, we release new multilingual test datasets and make our fine-tuned model publicly available to support further research and applications in spoken language processing."
37
  }
38
  ```
39
- ## Model Details
40
-
41
- This model classifies individual 20ms frames of audio based on
42
- presence of filled pauses ("eee", "errm", ...).
43
-
44
- ### Model Description
45
-
46
-
47
-
48
- - **Developed by:** Peter Rupnik, Nikola Ljubešić, Darinka Verdonik, Simona
49
- Majhenič
50
- - **Funded by:** MEZZANINE project
51
- - **Model type:** Wav2Vec2Bert for Audio Frame Classification
52
- - **Language(s) (NLP):** Trained and tested on Slovenian [ROG-Artur](http://hdl.handle.net/11356/1992), evaluated also on Croatian, Serbian, Polish, and Czech samples from the [ParlaSpeech corpus](http://clarinsi.github.io/parlaspeech)
53
- - **Finetuned from model:** facebook/w2v-bert-2.0
54
-
55
 
56
 
57
  # Training data
 
18
 
19
  # Frame classification for filled pauses
20
 
21
+
22
+ ## Model Details
23
+
24
+ This model classifies individual 20ms frames of audio based on
25
+ presence of filled pauses ("eee", "errm", ...).
26
+
27
+ ### Model Description
28
+
29
+
30
+
31
+ - **Developed by:** Peter Rupnik, Nikola Ljubešić, Darinka Verdonik, Simona
32
+ Majhenič
33
+ - **Funded by:** MEZZANINE project
34
+ - **Model type:** Wav2Vec2Bert for Audio Frame Classification
35
+ - **Language(s) (NLP):** Trained and tested on Slovenian [ROG-Artur](http://hdl.handle.net/11356/1992), evaluated also on Croatian, Serbian, Polish, and Czech samples from the [ParlaSpeech corpus](http://clarinsi.github.io/parlaspeech)
36
+ - **Finetuned from model:** facebook/w2v-bert-2.0
37
+
38
+
39
  ## Paper
40
  ```bibtex
41
  @inproceedings{ljubesic-etal-2025-identifying,
 
54
  abstract = "Filled pauses are among the most common paralinguistic features of speech, yet they are mainly omitted from transcripts. We propose a transformer-based approach for detecting filled pauses directly from the speech signal, fine-tuned on Slovenian and evaluated across South and West Slavic languages. Our results show that speech transformers achieve excellent performance in detecting filled pauses when evaluated in the in-language scenario. We further evaluate cross-lingual capabilities of the model on two closely related South Slavic languages (Croatian and Serbian) and two less closely related West Slavic languages (Czech and Polish). Our results reveal strong cross-lingual generalization capabilities of the model, with only minor performance drops. Moreover, error analysis reveals that the model outperforms human annotators in recall and F1 score, while trailing slightly in precision. In addition to evaluating the capabilities of speech transformers for filled pause detection across Slavic languages, we release new multilingual test datasets and make our fine-tuned model publicly available to support further research and applications in spoken language processing."
55
  }
56
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
 
59
  # Training data