BUT-FIT
/

diarizen-wavlm-large-s80-md

Voice Activity Detection

speaker-diarization

pyannote-audio-pipeline

Model card Files Files and versions

Jyhan003 commited on Jun 3

Commit

5f93079

·

1 Parent(s): 6928a4b

docs: update README

Files changed (1) hide show

README.md +28 -12

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ tags:
 ---
 ## Overview
-This hub features the pre-trained model by [DiariZen](https://github.com/BUTSpeechFIT/DiariZen). The EEND component is built upon WavLM-Large and Conformer layers. The model was trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse.
 Then structured pruning at 80% sparsity is applied. Smaller, faster, and better.
@@ -39,16 +39,32 @@ diar_pipeline = DiariZenPipeline.from_pretrained(
 diar_results = diar_pipeline('audio.wav', sess_name='session_name')
 ```
-## Results (will be updated soon)
-| Dataset       | DER (%)   |
-|:---------------|-----------|
-| AMI           | 14.0      |
-| AISHELL-4     | 10.0       |
-| AliMeeting    | 12.7      |
-| NOTSOFAR-1    | 19.4      |
-| MSDWild       | 16.2      |
-| DIHARD3       | 18.0      |
-| RAMC          | 12.2      |
-| VoxConverse   | 8.8      |

 ---
 ## Overview
+This hub features the pre-trained model by [DiariZen](https://github.com/BUTSpeechFIT/DiariZen). The EEND component is built upon WavLM Large and Conformer layers. The model was trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse.
 Then structured pruning at 80% sparsity is applied. Smaller, faster, and better.
 diar_results = diar_pipeline('audio.wav', sess_name='session_name')
 ```
+## Results (collar=0s)
+| Dataset       | [Pyannote v3.1](https://github.com/pyannote/pyannote-audio) | DiariZen |
+|:---------------|:-----------:|:-----------:|
+| AMI           | 22.4      | 14.0 |
+| AISHELL-4     | 12.2      | 9.8 |
+| AliMeeting    | 24.4      | 12.5 |
+| NOTSOFAR-1    | -      | 17.9 |
+| MSDWild       | 25.3      | 15.6 |
+| DIHARD3       | 21.7      | 14.5 |
+| RAMC          | 22.2      | 11.0 |
+| VoxConverse   | 11.3      | 9.2 |
+## Citation
+If you found this work helpful, please consider citing:
+```
+@inproceedings{han2025leveraging,
+  title={Leveraging self-supervised learning for speaker diarization},
+  author={Han, Jiangyu and Landini, Federico and Rohdin, Johan and Silnova, Anna and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
+  booktitle={Proc. ICASSP},
+  year={2025}
+}
+@inproceedings{han2025finetunestructuredpruningcompact,
+  title={Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization},
+  author={Jiangyu Han and Federico Landini and Johan Rohdin and Anna Silnova and Mireia Diez and Jan Cernocky and Lukas Burget},
+  booktitle={Proc. INTERSPEECH},
+  year={2025}
+}
+```