--- library_name: RAT language: - en license: mit datasets: - HuggingFaceFW/fineweb-edu tags: - efficient architecture - recurrence - attention - pretraining metrics: - perplexity - accuracy --- ## Description Models trained from [RAT Paper](https://arxiv.org/abs/2507.04416). ## Citation If you find it useful, please consider citing the paper: ``` @article{wei2025rat, title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling}, author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar}, journal={arXiv preprint arXiv:2507.04416}, year={2025} } ```