Qwen3.5-122B-A10B-abliterated-GGUF
GGUF quantized versions of wangzhang/Qwen3.5-122B-A10B-abliterated, a decensored Qwen/Qwen3.5-122B-A10B created using Abliterix.
Available Quantizations
| File | Quantization | Size | Quality | Use Case |
|---|---|---|---|---|
| Q4_K_M | Q4_K_M | 70 GB | High | 1x 80GB GPU or CPU with 96GB+ RAM |
| Q8_0 | Q8_0 | 121 GB | Very High | 2x 80GB GPUs or CPU with 160GB+ RAM |
About the Source Model
- 95% refusal reduction: Reduced from 100/100 refusals to just 5/100 (5%)
- MoE architecture: Qwen3.5-122B-A10B activates only ~10B parameters per token — 14B-class speed with 122B-class knowledge
- Minimal capability loss: KL divergence of just 0.0878 from the original model
- 50-trial Optuna TPE optimization: Automated Bayesian hyperparameter search with multi-objective Pareto optimization
- Orthogonalized abliteration: Surgical removal of refusal directions without degrading general intelligence
See wangzhang/Qwen3.5-122B-A10B-abliterated for full details on the abliteration process and steering parameters.
Usage
llama.cpp
# Q4_K_M (recommended for single GPU)
./llama-cli -m Qwen3.5-122B-A10B-abliterated-Q4_K_M.gguf -p "Your prompt here" -n 512
# Q8_0 (higher quality)
./llama-cli -m Qwen3.5-122B-A10B-abliterated-Q8_0.gguf -p "Your prompt here" -n 512
llama-server
./llama-server -m Qwen3.5-122B-A10B-abliterated-Q4_K_M.gguf --host 0.0.0.0 --port 8080
Ollama
# Create a Modelfile
echo "FROM ./Qwen3.5-122B-A10B-abliterated-Q4_K_M.gguf" > Modelfile
ollama create qwen3.5-122b-abliterated -f Modelfile
ollama run qwen3.5-122b-abliterated
VRAM / RAM Requirements
| Quantization | Full GPU Offload | Partial Offload (32 layers) | CPU Only |
|---|---|---|---|
| Q4_K_M | ~74 GB (1x 80GB) | ~40 GB GPU + 40 GB RAM | ~80 GB RAM |
| Q8_0 | ~130 GB (2x 80GB) | ~70 GB GPU + 70 GB RAM | ~140 GB RAM |
Disclaimer
This model is provided for research purposes only. The creator is not responsible for any misuse.
Credits
- Base model: Qwen/Qwen3.5-122B-A10B by Alibaba Qwen team
- Abliteration: wangzhang/Qwen3.5-122B-A10B-abliterated
- Abliteration framework: Abliterix
- Quantization: llama.cpp
- Downloads last month
- 136
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
Model tree for wangzhang/Qwen3.5-122B-A10B-abliterated-GGUF
Base model
Qwen/Qwen3.5-122B-A10B Finetuned
wangzhang/Qwen3.5-122B-A10B-abliterated