sheldonrobinson
/

Aria-sequential_mlp

Image-Text-to-Text

Model card Files Files and versions

aria-dev commited on Oct 18, 2024

Commit

021f586

·

1 Parent(s): 5c6db29

update readme

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ tags:
   <br>Aria</br>
 </p>  -->
-This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The main modification involves replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
 While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.

   <br>Aria</br>
 </p>  -->
+This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The only modification is replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
 While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.