RAT / README.md
barpitf's picture
Update readme
0418435 verified
---
library_name: RAT
language:
- en
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
tags:
- efficient architecture
- recurrence
- attention
- pretraining
metrics:
- perplexity
- accuracy
---
## Description
Models trained from [RAT Paper](https://arxiv.org/abs/2507.04416).
## Citation
If you find it useful, please consider citing the paper:
```
@article{wei2025rat,
title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling},
author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar},
journal={arXiv preprint arXiv:2507.04416},
year={2025}
}
```