sli521 commited on
Commit
27ecd0c
·
verified ·
1 Parent(s): 9b97121

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -3
README.md CHANGED
@@ -1,3 +1,72 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+ # Configuration
7
+ This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data.
8
+
9
+ Before starting, please ensure the requirements are met:
10
+
11
+ 1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.1` with `pip install pyannote.audio`
12
+ 2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
13
+ 3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
14
+ 4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
15
+ 5. Download pytorch_model.bin and config.yaml files into your local directory.
16
+
17
+ ## Usage
18
+
19
+ ### Load trained segmentation model
20
+ ```python
21
+ import torch
22
+ from pyannote.audio import Model
23
+
24
+ # Load the original architecture, will need to replace with your own auth token
25
+ model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True)
26
+
27
+ # Path to the downloaded pytorch model
28
+ model_path = "models/pyannote_sd_normal"
29
+
30
+ # Load fine-tuned weights from the pytorch_model.bin file
31
+ model.load_state_dict(torch.load(model_path + "/pytorch_model.bin"))
32
+ ```
33
+ ### Load fine-tuned speaker diarization pipeline
34
+ ```python
35
+ from pyannote.audio import Pipeline
36
+ from pyannote.metrics.diarization import DiarizationErrorRate
37
+ from pyannote.audio.pipelines import SpeakerDiarization
38
+
39
+ # Initialize the pyannote pipeline, will need to replace with your own auth token
40
+ pretrained_pipeline = Pipeline.from_pretrained(
41
+ "pyannote/speaker-diarization-3.1",
42
+ use_auth_token=True)
43
+
44
+ finetuned_pipeline = SpeakerDiarization(
45
+ segmentation=model,
46
+ embedding=pretrained_pipeline.embedding,
47
+ embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap,
48
+ clustering=pretrained_pipeline.klustering,
49
+ )
50
+
51
+ # Load fine-tuned params into the pipeline
52
+ finetuned_pipeline.load_params(model_path + "/config.yaml")
53
+ ```
54
+ ### GPU usage
55
+ ```
56
+ if torch.cuda.is_available():
57
+ gpu = torch.device("cuda")
58
+ finetuned_pipeline.to(gpu)
59
+ print("gpu: ", torch.cuda.get_device_name(gpu))
60
+ ```
61
+
62
+ ### Visualise diarization output
63
+ ```
64
+ diarization = finetuned_pipeline("path/to/audio.wav")
65
+ diarization
66
+ ```
67
+
68
+ ### View speaker turns, speaker ID, and time
69
+ ```
70
+ for speech_turn, track, speaker in diarization.itertracks(yield_label=True):
71
+ print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}")
72
+ ```