LFM2-24B-A2B-abliterated

Unrestricted version of LiquidAI/LFM2-24B-A2B, created using Abliterix.

This is the first abliterated model based on Liquid AI's hybrid gated short convolution + grouped query attention architecture with Mixture of Experts.

Model Details

Property	Value
Base Model	LiquidAI/LFM2-24B-A2B
Architecture	Hybrid Conv + GQA with MoE (64 experts, top-4 routing)
Parameters	24B total / 2.3B active per token
Layers	40 (10 attention + 30 convolution)
Hidden Size	2048
Context Length	128K tokens
Precision	BF16

Performance

Metric	This model	Original
KL divergence	0.0079	0
Refusals	0/100 (0%)	90/100 (90%)

Evaluated with an LLM judge (Gemini Flash) on 100 harmful prompts. KL divergence of 0.0079 indicates the model's general capabilities are virtually identical to the original.

How It Was Made

Computed refusal directions from 400 harmful vs 400 benign prompt pairs across all 40 layers
Applied orthogonalized abliteration to isolate refusal-specific activation patterns
Steered three component types independently: convolution output projections, attention output projections, and MLP/expert down-projections
Profiled MoE expert activations across 38 router layers to identify safety-critical experts
Applied hybrid MoE steering: router weight suppression (25 experts, bias=-0.41) + fused expert abliteration (weight=2.79)
Optimized via Optuna TPE (trial #10 of 50, with 15 warmup trials)

This is notable as the first successful abliteration of a non-transformer hybrid architecture — LFM2's gated short convolution blocks required novel steering targets beyond standard attention/MLP pairs.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "wangzhang/LFM2-24B-A2B-abliterated",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("wangzhang/LFM2-24B-A2B-abliterated")

messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Hardware Requirements

Precision	VRAM
BF16	~48 GB (A100 80GB, H100)
INT8	~24 GB (A40, RTX 4090)
NF4	~12 GB (RTX 3090, RTX 4080)

Note: This model requires a single GPU — the convolution layers do not support accelerate's multi-GPU device_map splitting.

Disclaimer

This model is intended for research purposes only. The removal of safety guardrails means the model will comply with requests that the original model would refuse. Users are responsible for ensuring their use complies with applicable laws and regulations.

Made with Abliterix

Downloads last month: 68

Safetensors

Model size

24B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wangzhang/LFM2-24B-A2B-abliterated

Base model

LiquidAI/LFM2-24B-A2B

Finetuned

(7)

this model

Quantizations

2 models