Overview

This hub features the pre-trained model by DiariZen as described in BUT System for the MLC-SLM Challenge. The EEND component is built upon WavLM-Large and Conformer layers. The model was pre-trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse. Then structured pruning at 80% sparsity is applied. Finally, the pruned model is fine-tuned with MLC-SLM data. When loading this model, please ensure non-commercial usage, in accordance with the CC BY-NC 4.0 license.

Usage

from diarizen.pipelines.inference import DiariZenPipeline

# load pre-trained model
diar_pipeline = DiariZenPipeline.from_pretrained("BUT-FIT/diarizen-wavlm-large-s80-mlc")
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav')

# print results
for turn, _, speaker in diar_results.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")

# load pre-trained model and save RTTM result
diar_pipeline = DiariZenPipeline.from_pretrained(
        "BUT-FIT/diarizen-wavlm-large-s80-mlc",
        rttm_out_dir='.'
)
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav', sess_name='session_name')

Results

DER evaluation of Pyannote baseline and DiariZen, with no collar applied.

Dataset Pyannote DiariZen
English-American 20.18 15.88
English-Australian 13.76 10.82
English-British 18.85 12.07
English-Filipino 13.19 10.28
English-Indian 8.19 6.04
French 22.62 17.33
German 22.33 16.35
Italian 10.64 8.85
Japanese 26.46 17.81
Korean 23.25 16.36
Portuguese 17.60 14.77
Russian 11.37 9.99
Spanish 12.92 10.82
Thai 10.90 10.62
Vietnamese 14.64 12.69
Average 16.44 12.71

Citation

If you found this work helpful, please consider citing:

@article{polok2025but,
  title={BUT System for the MLC-SLM Challenge},
  author={Polok, Alexander and Han, Jiangyu and Klement, Dominik and Cornell, Samuele and {\v{C}}ernock{\`y}, Jan and Burget, Luk{\'a}{\v{s}}},
  journal={arXiv preprint arXiv:2506.13414},
  year={2025}
}

License

  • Source code: MIT (see the project’s GitHub repository).
  • Model weights: CC BY-NC 4.0 (non-commercial).
  • Rationale: some training datasets are research-only or non-commercial, so the released weights cannot be used commercially.
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BUT-FIT/diarizen-wavlm-large-s80-mlc