doc: update README

Files changed (1) hide show

README.md CHANGED Viewed

@@ -26,11 +26,7 @@ We propose (paid) scientific [consulting services](https://herve.niderb.fr/consu
 # 🎹 "Powerset" speaker segmentation
-The various concepts behind this model are described in details in this [paper](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html).
-It has been trained by Séverin Baroudi with [pyannote.audio](https://github.com/pyannote/pyannote-audio) `3.0.0` using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.
-It ingests (ideally 10s of) mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are _non-speech_, _speaker #1_, _speaker #2_, _speaker #3_, _speakers #1 and #2_, _speakers #1 and #3_, and _speakers #2 and #3_.
 ![Example output](example.png)
@@ -51,6 +47,12 @@ to_multilabel = Powerset(
 multilabel_encoding = to_multilabel(powerset_encoding)
 ```
 ## Usage
 ```python
@@ -64,7 +66,7 @@ model = Model.from_pretrained("pyannote/segmentation-3.0.0",
 ### Speaker diarization
-This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunk).
 See [pyannote/speaker-diarization-3.0.0](https://hf.co/pyannote/speaker-diarization-3.0.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.

 # 🎹 "Powerset" speaker segmentation
+This model ingests (ideally 10s of) mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are _non-speech_, _speaker #1_, _speaker #2_, _speaker #3_, _speakers #1 and #2_, _speakers #1 and #3_, and _speakers #2 and #3_.
 ![Example output](example.png)
 multilabel_encoding = to_multilabel(powerset_encoding)
 ```
+The various concepts behind this model are described in details in this [paper](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html).
+It has been trained by Séverin Baroudi with [pyannote.audio](https://github.com/pyannote/pyannote-audio) `3.0.0` using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.
+This [companion repository](https://github.com/FrenchKrab/IS2023-powerset-diarization/) by [Alexis Plaquet](https://frenchkrab.github.io/) also provides instructions on how to train or finetune such a model on your own data.
 ## Usage
 ```python
 ### Speaker diarization
+This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
 See [pyannote/speaker-diarization-3.0.0](https://hf.co/pyannote/speaker-diarization-3.0.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.