update readme
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ tags:
|
|
| 12 |
<br>Aria</br>
|
| 13 |
</p> -->
|
| 14 |
|
| 15 |
-
This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The
|
| 16 |
|
| 17 |
While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.
|
| 18 |
|
|
|
|
| 12 |
<br>Aria</br>
|
| 13 |
</p> -->
|
| 14 |
|
| 15 |
+
This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The only modification is replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
|
| 16 |
|
| 17 |
While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.
|
| 18 |
|