metadata
library_name: RAT
language:
- en
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
tags:
- efficient architecture
- recurrence
- attention
- pretraining
metrics:
- perplexity
- accuracy
Description
Models trained from RAT Paper.
Citation
If you find it useful, please consider citing the paper:
@article{wei2025rat,
title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling},
author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar},
journal={arXiv preprint arXiv:2507.04416},
year={2025}
}