Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ language:
|
|
| 10 |
|
| 11 |
The abliteration script ([link](https://github.com/IlyaGusev/saiga/blob/main/scripts/abliterate.py)) is based on code from the blog post and heavily uses [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens). The only major difference from the code used for Llama is [scaling the embedding layer back](https://github.com/TransformerLensOrg/TransformerLens/blob/main/transformer_lens/pretrained/weight_conversions/gemma.py#L13).
|
| 12 |
|
| 13 |
-
Orthogonalization **did not** produce the same results as regular interventions. However, the final model still seems to be uncensored.
|
| 14 |
|
| 15 |
## Examples:
|
| 16 |
|
|
|
|
| 10 |
|
| 11 |
The abliteration script ([link](https://github.com/IlyaGusev/saiga/blob/main/scripts/abliterate.py)) is based on code from the blog post and heavily uses [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens). The only major difference from the code used for Llama is [scaling the embedding layer back](https://github.com/TransformerLensOrg/TransformerLens/blob/main/transformer_lens/pretrained/weight_conversions/gemma.py#L13).
|
| 12 |
|
| 13 |
+
Orthogonalization **did not** produce the same results as regular interventions since there are RMSNorm layers before merging activations into the residual stream. However, the final model still seems to be uncensored.
|
| 14 |
|
| 15 |
## Examples:
|
| 16 |
|