You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3.5-122B-A10B-abliterated-GGUF

GGUF quantized versions of wangzhang/Qwen3.5-122B-A10B-abliterated, a decensored Qwen/Qwen3.5-122B-A10B created using Abliterix.

Available Quantizations

File Quantization Size Quality Use Case
Q4_K_M Q4_K_M 70 GB High 1x 80GB GPU or CPU with 96GB+ RAM
Q8_0 Q8_0 121 GB Very High 2x 80GB GPUs or CPU with 160GB+ RAM

About the Source Model

  • 95% refusal reduction: Reduced from 100/100 refusals to just 5/100 (5%)
  • MoE architecture: Qwen3.5-122B-A10B activates only ~10B parameters per token — 14B-class speed with 122B-class knowledge
  • Minimal capability loss: KL divergence of just 0.0878 from the original model
  • 50-trial Optuna TPE optimization: Automated Bayesian hyperparameter search with multi-objective Pareto optimization
  • Orthogonalized abliteration: Surgical removal of refusal directions without degrading general intelligence

See wangzhang/Qwen3.5-122B-A10B-abliterated for full details on the abliteration process and steering parameters.

Usage

llama.cpp

# Q4_K_M (recommended for single GPU)
./llama-cli -m Qwen3.5-122B-A10B-abliterated-Q4_K_M.gguf -p "Your prompt here" -n 512

# Q8_0 (higher quality)
./llama-cli -m Qwen3.5-122B-A10B-abliterated-Q8_0.gguf -p "Your prompt here" -n 512

llama-server

./llama-server -m Qwen3.5-122B-A10B-abliterated-Q4_K_M.gguf --host 0.0.0.0 --port 8080

Ollama

# Create a Modelfile
echo "FROM ./Qwen3.5-122B-A10B-abliterated-Q4_K_M.gguf" > Modelfile
ollama create qwen3.5-122b-abliterated -f Modelfile
ollama run qwen3.5-122b-abliterated

VRAM / RAM Requirements

Quantization Full GPU Offload Partial Offload (32 layers) CPU Only
Q4_K_M ~74 GB (1x 80GB) ~40 GB GPU + 40 GB RAM ~80 GB RAM
Q8_0 ~130 GB (2x 80GB) ~70 GB GPU + 70 GB RAM ~140 GB RAM

Disclaimer

This model is provided for research purposes only. The creator is not responsible for any misuse.

Credits

Downloads last month
136
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wangzhang/Qwen3.5-122B-A10B-abliterated-GGUF

Quantized
(6)
this model