File size: 2,164 Bytes
			
			| 84d9a3b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | ---
tags:
- speech-recognition
- audio
- chunkformer
- ctc
- pytorch
- transformers
- automatic-speech-recognition
- long-form transcription
- asr
license: apache-2.0
library_name: transformers
pipeline_tag: automatic-speech-recognition
---
# ChunkFormer Model
<style>
img {
display: inline;
}
</style>
[](https://github.com/khanld/chunkformer)
[](https://arxiv.org/abs/2502.14673)
## Usage
Install the package:
```bash
pip install chunkformer
```
```python
from chunkformer import ChunkFormerModel
# Load the model
model = ChunkFormerModel.from_pretrained("khanhld/chunkFormer-ctc-small-libri-960h")
# For long-form audio transcription
transcription = model.endless_decode(
    audio_path="path/to/your/audio.wav",
    chunk_size=64,
    left_context_size=128,
    right_context_size=128,
    return_timestamps=True
)
print(transcription)
# For batch processing
audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
transcriptions = model.batch_decode(
    audio_paths=audio_files,
    chunk_size=64,
    left_context_size=128,
    right_context_size=128
)
```
## Training
This model was trained using the ChunkFormer framework. For more details about the training process and to access the source code, please visit: https://github.com/khanld/chunkformer
Paper: https://arxiv.org/abs/2502.14673
## Citation
If you use this work in your research, please cite:
```bibtex
@INPROCEEDINGS{10888640,
    author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh},
    booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription},
    year={2025},
    volume={},
    number={},
    pages={1-5},
    keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription},
    doi={10.1109/ICASSP49660.2025.10888640}}
```
 | 
