Orange
/

Speaker-wavLM-pro

Model card Files Files and versions

ggmbr commited on Sep 5

Commit

8451deb

·

1 Parent(s): 87d7e29

variants

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -60,15 +60,16 @@ the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/me
 (EER, lower value denotes a better identification, random prediction leads to a value of 50%) and the associated threshold.
 This value can be interpreted as the ability to identify speakers only with non-timbral cues. Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of prosodic cues.
 The table below provides the EER and threshold of the different [variants](#variants) of this model.
 | Variant name| EER (%) | threshold |
 | --- | --- | --- |
 | W-PRO   | 10.68 | 0.467 |
 | WNTA128 | 5.00 | 0.282 |
-A discussion about this interpretation can be
-found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and non-timbral voice attributes.
 Please note that the EER value can vary a little depending on the `max_size` defined to reduce long audios (max 30 seconds in our case).
@@ -112,6 +113,7 @@ The table below provides a short description of the variants and their performan
 | --- | --- | --- | --- |
 | W-PRO | main    | baseline, description in paper | 250 |
 | WNTA128 | wnta128 | enriched training dataset, more conversions | 128  |
 # License

 (EER, lower value denotes a better identification, random prediction leads to a value of 50%) and the associated threshold.
 This value can be interpreted as the ability to identify speakers only with non-timbral cues. Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of prosodic cues.
+A discussion about this interpretation can be
+found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and non-timbral voice attributes.
 The table below provides the EER and threshold of the different [variants](#variants) of this model.
 | Variant name| EER (%) | threshold |
 | --- | --- | --- |
 | W-PRO   | 10.68 | 0.467 |
 | WNTA128 | 5.00 | 0.282 |
+| WNTA64 | 5.13 | 0.332 |
 Please note that the EER value can vary a little depending on the `max_size` defined to reduce long audios (max 30 seconds in our case).
 | --- | --- | --- | --- |
 | W-PRO | main    | baseline, description in paper | 250 |
 | WNTA128 | wnta128 | enriched training dataset, more conversions | 128  |
+| WNTA64 | wnta64 | enriched training dataset, more conversions | 64  |
 # License