LFM2-24B-A2B-abliterated

Unrestricted version of LiquidAI/LFM2-24B-A2B, created using Abliterix.

This is the first abliterated model based on Liquid AI's hybrid gated short convolution + grouped query attention architecture with Mixture of Experts.

Model Details

Property Value
Base Model LiquidAI/LFM2-24B-A2B
Architecture Hybrid Conv + GQA with MoE (64 experts, top-4 routing)
Parameters 24B total / 2.3B active per token
Layers 40 (10 attention + 30 convolution)
Hidden Size 2048
Context Length 128K tokens
Precision BF16

Performance

Metric This model Original
KL divergence 0.0079 0
Refusals 0/100 (0%) 90/100 (90%)

Evaluated with an LLM judge (Gemini Flash) on 100 harmful prompts. KL divergence of 0.0079 indicates the model's general capabilities are virtually identical to the original.

How It Was Made

  1. Computed refusal directions from 400 harmful vs 400 benign prompt pairs across all 40 layers
  2. Applied orthogonalized abliteration to isolate refusal-specific activation patterns
  3. Steered three component types independently: convolution output projections, attention output projections, and MLP/expert down-projections
  4. Profiled MoE expert activations across 38 router layers to identify safety-critical experts
  5. Applied hybrid MoE steering: router weight suppression (25 experts, bias=-0.41) + fused expert abliteration (weight=2.79)
  6. Optimized via Optuna TPE (trial #10 of 50, with 15 warmup trials)

This is notable as the first successful abliteration of a non-transformer hybrid architecture — LFM2's gated short convolution blocks required novel steering targets beyond standard attention/MLP pairs.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "wangzhang/LFM2-24B-A2B-abliterated",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("wangzhang/LFM2-24B-A2B-abliterated")

messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Hardware Requirements

Precision VRAM
BF16 ~48 GB (A100 80GB, H100)
INT8 ~24 GB (A40, RTX 4090)
NF4 ~12 GB (RTX 3090, RTX 4080)

Note: This model requires a single GPU — the convolution layers do not support accelerate's multi-GPU device_map splitting.

Disclaimer

This model is intended for research purposes only. The removal of safety guardrails means the model will comply with requests that the original model would refuse. Users are responsible for ensuring their use complies with applicable laws and regulations.


Made with Abliterix

Downloads last month
68
Safetensors
Model size
24B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wangzhang/LFM2-24B-A2B-abliterated

Finetuned
(7)
this model
Quantizations
2 models