|
|
--- |
|
|
library_name: RAT |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
datasets: |
|
|
- HuggingFaceFW/fineweb-edu |
|
|
tags: |
|
|
- efficient architecture |
|
|
- recurrence |
|
|
- attention |
|
|
- pretraining |
|
|
metrics: |
|
|
- perplexity |
|
|
- accuracy |
|
|
--- |
|
|
|
|
|
## Description |
|
|
Models trained from [RAT Paper](https://arxiv.org/abs/2507.04416). |
|
|
|
|
|
## Citation |
|
|
If you find it useful, please consider citing the paper: |
|
|
``` |
|
|
@article{wei2025rat, |
|
|
title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling}, |
|
|
author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar}, |
|
|
journal={arXiv preprint arXiv:2507.04416}, |
|
|
year={2025} |
|
|
} |
|
|
``` |