Jyhan003 commited on
Commit
5f93079
·
1 Parent(s): 6928a4b

docs: update README

Browse files
Files changed (1) hide show
  1. README.md +28 -12
README.md CHANGED
@@ -12,7 +12,7 @@ tags:
12
  ---
13
 
14
  ## Overview
15
- This hub features the pre-trained model by [DiariZen](https://github.com/BUTSpeechFIT/DiariZen). The EEND component is built upon WavLM-Large and Conformer layers. The model was trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse.
16
 
17
  Then structured pruning at 80% sparsity is applied. Smaller, faster, and better.
18
 
@@ -39,16 +39,32 @@ diar_pipeline = DiariZenPipeline.from_pretrained(
39
  diar_results = diar_pipeline('audio.wav', sess_name='session_name')
40
  ```
41
 
42
- ## Results (will be updated soon)
43
- | Dataset | DER (%) |
44
- |:---------------|-----------|
45
- | AMI | 14.0 |
46
- | AISHELL-4 | 10.0 |
47
- | AliMeeting | 12.7 |
48
- | NOTSOFAR-1 | 19.4 |
49
- | MSDWild | 16.2 |
50
- | DIHARD3 | 18.0 |
51
- | RAMC | 12.2 |
52
- | VoxConverse | 8.8 |
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
 
12
  ---
13
 
14
  ## Overview
15
+ This hub features the pre-trained model by [DiariZen](https://github.com/BUTSpeechFIT/DiariZen). The EEND component is built upon WavLM Large and Conformer layers. The model was trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse.
16
 
17
  Then structured pruning at 80% sparsity is applied. Smaller, faster, and better.
18
 
 
39
  diar_results = diar_pipeline('audio.wav', sess_name='session_name')
40
  ```
41
 
42
+ ## Results (collar=0s)
43
+ | Dataset | [Pyannote v3.1](https://github.com/pyannote/pyannote-audio) | DiariZen |
44
+ |:---------------|:-----------:|:-----------:|
45
+ | AMI | 22.4 | 14.0 |
46
+ | AISHELL-4 | 12.2 | 9.8 |
47
+ | AliMeeting | 24.4 | 12.5 |
48
+ | NOTSOFAR-1 | - | 17.9 |
49
+ | MSDWild | 25.3 | 15.6 |
50
+ | DIHARD3 | 21.7 | 14.5 |
51
+ | RAMC | 22.2 | 11.0 |
52
+ | VoxConverse | 11.3 | 9.2 |
53
 
54
+ ## Citation
55
+ If you found this work helpful, please consider citing:
56
+ ```
57
+ @inproceedings{han2025leveraging,
58
+ title={Leveraging self-supervised learning for speaker diarization},
59
+ author={Han, Jiangyu and Landini, Federico and Rohdin, Johan and Silnova, Anna and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
60
+ booktitle={Proc. ICASSP},
61
+ year={2025}
62
+ }
63
+ @inproceedings{han2025finetunestructuredpruningcompact,
64
+ title={Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization},
65
+ author={Jiangyu Han and Federico Landini and Johan Rohdin and Anna Silnova and Mireia Diez and Jan Cernocky and Lukas Burget},
66
+ booktitle={Proc. INTERSPEECH},
67
+ year={2025}
68
+ }
69
+ ```
70