ggmbr commited on
Commit
8451deb
·
1 Parent(s): 87d7e29
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -60,15 +60,16 @@ the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/me
60
  (EER, lower value denotes a better identification, random prediction leads to a value of 50%) and the associated threshold.
61
  This value can be interpreted as the ability to identify speakers only with non-timbral cues. Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of prosodic cues.
62
 
 
 
 
63
  The table below provides the EER and threshold of the different [variants](#variants) of this model.
64
 
65
  | Variant name| EER (%) | threshold |
66
  | --- | --- | --- |
67
  | W-PRO | 10.68 | 0.467 |
68
  | WNTA128 | 5.00 | 0.282 |
69
-
70
- A discussion about this interpretation can be
71
- found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and non-timbral voice attributes.
72
 
73
  Please note that the EER value can vary a little depending on the `max_size` defined to reduce long audios (max 30 seconds in our case).
74
 
@@ -112,6 +113,7 @@ The table below provides a short description of the variants and their performan
112
  | --- | --- | --- | --- |
113
  | W-PRO | main | baseline, description in paper | 250 |
114
  | WNTA128 | wnta128 | enriched training dataset, more conversions | 128 |
 
115
 
116
  # License
117
 
 
60
  (EER, lower value denotes a better identification, random prediction leads to a value of 50%) and the associated threshold.
61
  This value can be interpreted as the ability to identify speakers only with non-timbral cues. Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of prosodic cues.
62
 
63
+ A discussion about this interpretation can be
64
+ found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and non-timbral voice attributes.
65
+
66
  The table below provides the EER and threshold of the different [variants](#variants) of this model.
67
 
68
  | Variant name| EER (%) | threshold |
69
  | --- | --- | --- |
70
  | W-PRO | 10.68 | 0.467 |
71
  | WNTA128 | 5.00 | 0.282 |
72
+ | WNTA64 | 5.13 | 0.332 |
 
 
73
 
74
  Please note that the EER value can vary a little depending on the `max_size` defined to reduce long audios (max 30 seconds in our case).
75
 
 
113
  | --- | --- | --- | --- |
114
  | W-PRO | main | baseline, description in paper | 250 |
115
  | WNTA128 | wnta128 | enriched training dataset, more conversions | 128 |
116
+ | WNTA64 | wnta64 | enriched training dataset, more conversions | 64 |
117
 
118
  # License
119