sli521
/

speaker-diarization

pyannote-audio-pipeline

speaker-diarization

Model card Files Files and versions

sli521 commited on Oct 14, 2024

Commit

27ecd0c

·

verified ·

1 Parent(s): 9b97121

Update README.md

Files changed (1) hide show

README.md +72 -3

README.md CHANGED Viewed

@@ -1,3 +1,72 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+---
+# Configuration
+This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data.
+Before starting, please ensure the requirements are met:
+1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.1` with `pip install pyannote.audio`
+2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
+3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
+4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
+5. Download pytorch_model.bin and config.yaml files into your local directory.
+## Usage
+### Load trained segmentation model
+```python
+import torch
+from pyannote.audio import Model
+# Load the original architecture, will need to replace with your own auth token
+model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True)
+# Path to the downloaded pytorch model
+model_path = "models/pyannote_sd_normal"
+# Load fine-tuned weights from the pytorch_model.bin file
+model.load_state_dict(torch.load(model_path + "/pytorch_model.bin"))
+```
+### Load fine-tuned speaker diarization pipeline
+```python
+from pyannote.audio import Pipeline
+from pyannote.metrics.diarization import DiarizationErrorRate
+from pyannote.audio.pipelines import SpeakerDiarization
+# Initialize the pyannote pipeline, will need to replace with your own auth token
+pretrained_pipeline = Pipeline.from_pretrained(
+    "pyannote/speaker-diarization-3.1",
+    use_auth_token=True)
+finetuned_pipeline = SpeakerDiarization(
+    segmentation=model,
+    embedding=pretrained_pipeline.embedding,
+    embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap,
+    clustering=pretrained_pipeline.klustering,
+)
+# Load fine-tuned params into the pipeline
+finetuned_pipeline.load_params(model_path + "/config.yaml")
+```
+### GPU usage
+```
+if torch.cuda.is_available():
+    gpu = torch.device("cuda")
+    finetuned_pipeline.to(gpu)
+    print("gpu: ", torch.cuda.get_device_name(gpu))
+```
+### Visualise diarization output
+```
+diarization = finetuned_pipeline("path/to/audio.wav")
+diarization
+```
+### View speaker turns, speaker ID, and time
+```
+for speech_turn, track, speaker in diarization.itertracks(yield_label=True):
+    print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}")
+```