Hervé BREDIN
commited on
Commit
·
acf591c
1
Parent(s):
e4aa6b9
doc: update README
Browse files
README.md
CHANGED
|
@@ -26,11 +26,7 @@ We propose (paid) scientific [consulting services](https://herve.niderb.fr/consu
|
|
| 26 |
|
| 27 |
# 🎹 "Powerset" speaker segmentation
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
It has been trained by Séverin Baroudi with [pyannote.audio](https://github.com/pyannote/pyannote-audio) `3.0.0` using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.
|
| 32 |
-
|
| 33 |
-
It ingests (ideally 10s of) mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are _non-speech_, _speaker #1_, _speaker #2_, _speaker #3_, _speakers #1 and #2_, _speakers #1 and #3_, and _speakers #2 and #3_.
|
| 34 |
|
| 35 |

|
| 36 |
|
|
@@ -51,6 +47,12 @@ to_multilabel = Powerset(
|
|
| 51 |
multilabel_encoding = to_multilabel(powerset_encoding)
|
| 52 |
```
|
| 53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
## Usage
|
| 55 |
|
| 56 |
```python
|
|
@@ -64,7 +66,7 @@ model = Model.from_pretrained("pyannote/segmentation-3.0.0",
|
|
| 64 |
|
| 65 |
### Speaker diarization
|
| 66 |
|
| 67 |
-
This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s
|
| 68 |
|
| 69 |
See [pyannote/speaker-diarization-3.0.0](https://hf.co/pyannote/speaker-diarization-3.0.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
|
| 70 |
|
|
|
|
| 26 |
|
| 27 |
# 🎹 "Powerset" speaker segmentation
|
| 28 |
|
| 29 |
+
This model ingests (ideally 10s of) mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are _non-speech_, _speaker #1_, _speaker #2_, _speaker #3_, _speakers #1 and #2_, _speakers #1 and #3_, and _speakers #2 and #3_.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |

|
| 32 |
|
|
|
|
| 47 |
multilabel_encoding = to_multilabel(powerset_encoding)
|
| 48 |
```
|
| 49 |
|
| 50 |
+
The various concepts behind this model are described in details in this [paper](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html).
|
| 51 |
+
|
| 52 |
+
It has been trained by Séverin Baroudi with [pyannote.audio](https://github.com/pyannote/pyannote-audio) `3.0.0` using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.
|
| 53 |
+
|
| 54 |
+
This [companion repository](https://github.com/FrenchKrab/IS2023-powerset-diarization/) by [Alexis Plaquet](https://frenchkrab.github.io/) also provides instructions on how to train or finetune such a model on your own data.
|
| 55 |
+
|
| 56 |
## Usage
|
| 57 |
|
| 58 |
```python
|
|
|
|
| 66 |
|
| 67 |
### Speaker diarization
|
| 68 |
|
| 69 |
+
This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
|
| 70 |
|
| 71 |
See [pyannote/speaker-diarization-3.0.0](https://hf.co/pyannote/speaker-diarization-3.0.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
|
| 72 |
|