Overview

This hub features the pre-trained model by DiariZen as described in BUT System for the MLC-SLM Challenge. The EEND component is built upon WavLM-Large and Conformer layers. The model was pre-trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse. Then structured pruning at 80% sparsity is applied. Finally, the pruned model is fine-tuned with MLC-SLM data. When loading this model, please ensure non-commercial usage, in accordance with the CC BY-NC 4.0 license.

Usage

from diarizen.pipelines.inference import DiariZenPipeline

# load pre-trained model
diar_pipeline = DiariZenPipeline.from_pretrained("BUT-FIT/diarizen-wavlm-large-s80-mlc")
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav')

# print results
for turn, _, speaker in diar_results.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")

# load pre-trained model and save RTTM result
diar_pipeline = DiariZenPipeline.from_pretrained(
        "BUT-FIT/diarizen-wavlm-large-s80-mlc",
        rttm_out_dir='.'
)
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav', sess_name='session_name')

Results

DER evaluation of Pyannote baseline and DiariZen, with no collar applied.

Dataset	Pyannote	DiariZen
English-American	20.18	15.88
English-Australian	13.76	10.82
English-British	18.85	12.07
English-Filipino	13.19	10.28
English-Indian	8.19	6.04
French	22.62	17.33
German	22.33	16.35
Italian	10.64	8.85
Japanese	26.46	17.81
Korean	23.25	16.36
Portuguese	17.60	14.77
Russian	11.37	9.99
Spanish	12.92	10.82
Thai	10.90	10.62
Vietnamese	14.64	12.69
Average	16.44	12.71

Citation

If you found this work helpful, please consider citing:

@article{polok2025but,
  title={BUT System for the MLC-SLM Challenge},
  author={Polok, Alexander and Han, Jiangyu and Klement, Dominik and Cornell, Samuele and {\v{C}}ernock{\`y}, Jan and Burget, Luk{\'a}{\v{s}}},
  journal={arXiv preprint arXiv:2506.13414},
  year={2025}
}

License

Source code: MIT (see the project’s GitHub repository).
Model weights: CC BY-NC 4.0 (non-commercial).
Rationale: some training datasets are research-only or non-commercial, so the released weights cannot be used commercially.

Downloads last month: 19

Inference Providers NEW

Voice Activity Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BUT-FIT/diarizen-wavlm-large-s80-mlc

DiariZen

Collection

DiariZen is a speaker diarization toolkit driven by AudioZen and Pyannote 3.1. • 4 items • Updated Jun 3