Hervé BREDIN commited on
Commit
acf591c
·
1 Parent(s): e4aa6b9

doc: update README

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -26,11 +26,7 @@ We propose (paid) scientific [consulting services](https://herve.niderb.fr/consu
26
 
27
  # 🎹 "Powerset" speaker segmentation
28
 
29
- The various concepts behind this model are described in details in this [paper](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html).
30
-
31
- It has been trained by Séverin Baroudi with [pyannote.audio](https://github.com/pyannote/pyannote-audio) `3.0.0` using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.
32
-
33
- It ingests (ideally 10s of) mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are _non-speech_, _speaker #1_, _speaker #2_, _speaker #3_, _speakers #1 and #2_, _speakers #1 and #3_, and _speakers #2 and #3_.
34
 
35
  ![Example output](example.png)
36
 
@@ -51,6 +47,12 @@ to_multilabel = Powerset(
51
  multilabel_encoding = to_multilabel(powerset_encoding)
52
  ```
53
 
 
 
 
 
 
 
54
  ## Usage
55
 
56
  ```python
@@ -64,7 +66,7 @@ model = Model.from_pretrained("pyannote/segmentation-3.0.0",
64
 
65
  ### Speaker diarization
66
 
67
- This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunk).
68
 
69
  See [pyannote/speaker-diarization-3.0.0](https://hf.co/pyannote/speaker-diarization-3.0.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
70
 
 
26
 
27
  # 🎹 "Powerset" speaker segmentation
28
 
29
+ This model ingests (ideally 10s of) mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are _non-speech_, _speaker #1_, _speaker #2_, _speaker #3_, _speakers #1 and #2_, _speakers #1 and #3_, and _speakers #2 and #3_.
 
 
 
 
30
 
31
  ![Example output](example.png)
32
 
 
47
  multilabel_encoding = to_multilabel(powerset_encoding)
48
  ```
49
 
50
+ The various concepts behind this model are described in details in this [paper](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html).
51
+
52
+ It has been trained by Séverin Baroudi with [pyannote.audio](https://github.com/pyannote/pyannote-audio) `3.0.0` using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.
53
+
54
+ This [companion repository](https://github.com/FrenchKrab/IS2023-powerset-diarization/) by [Alexis Plaquet](https://frenchkrab.github.io/) also provides instructions on how to train or finetune such a model on your own data.
55
+
56
  ## Usage
57
 
58
  ```python
 
66
 
67
  ### Speaker diarization
68
 
69
+ This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
70
 
71
  See [pyannote/speaker-diarization-3.0.0](https://hf.co/pyannote/speaker-diarization-3.0.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
72