Update README.md
Browse files
README.md
CHANGED
|
@@ -113,8 +113,16 @@ This model is part of a collection of LayerNorm-free models. The table below pro
|
|
| 113 |
|
| 114 |
## Citation
|
| 115 |
|
| 116 |
-
|
| 117 |
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
## Citation
|
| 115 |
|
| 116 |
+
If you have found our work useful please cite as:
|
| 117 |
|
| 118 |
+
```
|
| 119 |
+
@misc{gpt2layernorm2025,
|
| 120 |
+
author = {Baroni, Luca and Khara, Galvin and Schaeffer, Joachim and Subkhankulov, Marat and Heimersheim, Stefan},
|
| 121 |
+
title = {Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability},
|
| 122 |
+
year = {2025},
|
| 123 |
+
eprint = {2507.02559},
|
| 124 |
+
archivePrefix = {arXiv},
|
| 125 |
+
primaryClass = {cs.LG},
|
| 126 |
+
url = {https://arxiv.org/abs/2507.02559v1}
|
| 127 |
+
}
|
| 128 |
+
```
|