Gemma 4 is so new—or rather, it’s still in its infancy—that I think it’ll take a little time before it works perfectly across the entire HF ecosystem…
But it looks like it’ll barely work out somehow :
This is a known upstream compatibility bug, not a one-off mistake in your setup. There is a workaround. There is no public release ETA yet. As of now, PyPI still lists PEFT 0.18.1 as the latest release, while the PEFT main branch has already moved to 0.18.2.dev0. The Gemma 4 support issue is open and currently shows no assignee, no milestone, and no linked PR, so there is no evidence of an imminent packaged fix yet. (PyPI)
What is happening
Your error:
ValueError: Target module Gemma4ClippableLinear(...) is not supported
is happening because PEFT LoRA only knows how to inject adapters into a specific set of module types such as torch.nn.Linear, Embedding, Conv*, Conv1D, and MultiheadAttention. The open PEFT Gemma 4 issue explains that Gemma4ClippableLinear is a wrapper nn.Module, not a subclass of nn.Linear, so PEFT rejects it during LoRA injection. The same issue notes that this happens before exclude_modules can protect you. (GitHub)
Why Gemma 4 E2B triggers this
Gemma 4 E2B is not just a plain text decoder. The official model card says Gemma 4 is multimodal, with text and image input across the family, and audio support on the small E2B and E4B variants. The Hugging Face Gemma 4 launch post says the same thing and describes E2B/E4B as the small variants with audio support. That matters because the PEFT issue identifies Gemma4ClippableLinear as being used in the vision/audio encoder. (Hugging Face)
So the practical root cause is usually:
- you are using broad LoRA targeting, and
- PEFT walks into Gemma 4’s multimodal towers,
- then it hits
Gemma4ClippableLinear,
- then it crashes. (Hugging Face)
Why this often surprises people
PEFT’s own LoRA docs recommend target_modules="all-linear" for QLoRA-style training. That works well on many standard text-only transformer models. But PEFT also documents that target_modules behaves broadly:
- if you pass a string, it uses regex matching,
- if you pass a list of strings, it uses exact match or suffix match,
- and if you pass
"all-linear", it targets all linear/Conv1D modules. (Hugging Face)
On a new multimodal model like Gemma 4 E2B, that convenience becomes a hazard. A setting that is fine for Llama-style text-only training can unintentionally reach vision or audio blocks.
Will there be an update soon
There will likely be an update eventually, but there is no public timeline I can justify from the current public state. The issue is open. It has no assignee, no milestone, and no branches or pull requests attached. That means “soon” would be speculation. (GitHub)
The best workaround depends on what you are actually fine-tuning
Case 1. You are doing text-only fine-tuning
This is the cleanest path, and probably the best one for most users fine-tuning E2B on text data.
In this case, the best fix is usually:
- Do not use
target_modules="all-linear"
- Do not use loose suffix-only target lists like
["q_proj", "v_proj", "o_proj"]
- Build an explicit list of full module names from the text backbone only
- Skip anything under vision or audio towers
- Keep only actual
nn.Linear layers in the text path
That works because PEFT will then never touch the unsupported multimodal wrapper modules. This approach is more stable than monkey-patching if your task is text-only. The PEFT docs confirm how target matching works, and the custom-models docs recommend inspecting which modules were actually adapted. (Hugging Face)
A discovery pattern like this is the safest starting point:
import torch.nn as nn
TEXT_SUFFIXES = (
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
)
def discover_text_targets(model):
out = []
for name, module in model.named_modules():
lname = name.lower()
# Skip multimodal branches
if "vision" in lname or "audio" in lname:
continue
# Keep only typical text LoRA targets
if not name.endswith(TEXT_SUFFIXES):
continue
# Only ordinary linear layers
if isinstance(module, nn.Linear):
out.append(name)
return out
target_modules = discover_text_targets(model)
print("\n".join(target_modules[:50]))
print(f"Found {len(target_modules)} targets")
Then:
from peft import LoraConfig, get_peft_model
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=target_modules,
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# Useful sanity check
print(model.targeted_module_names)
That last check matters. PEFT’s docs explicitly recommend checking trainable parameters and the targeted modules to confirm you only adapted the intended layers. (Hugging Face)
Case 2. You are doing actual multimodal fine-tuning
If you really want to fine-tune image or audio paths too, then the current practical workaround is the one documented in the open PEFT issue: monkey-patch Gemma4ClippableLinear before loading the model so PEFT sees a supported linear-like class. The issue includes this workaround and reports that QLoRA then proceeds normally. (GitHub)
Use this before from_pretrained():
import torch
import torch.nn as nn
from transformers.models.gemma4 import modeling_gemma4
class PatchedClippableLinear(nn.Linear):
def __init__(self, config, in_features, out_features):
nn.Linear.__init__(self, in_features, out_features, bias=False)
self.use_clipped_linears = getattr(config, "use_clipped_linears", False)
if self.use_clipped_linears:
self.register_buffer("input_min", torch.tensor(-float("inf")))
self.register_buffer("input_max", torch.tensor(float("inf")))
self.register_buffer("output_min", torch.tensor(-float("inf")))
self.register_buffer("output_max", torch.tensor(float("inf")))
def forward(self, x):
if self.use_clipped_linears:
x = torch.clamp(x, self.input_min, self.input_max)
out = nn.Linear.forward(self, x)
if self.use_clipped_linears:
out = torch.clamp(out, self.output_min, self.output_max)
return out
modeling_gemma4.Gemma4ClippableLinear = PatchedClippableLinear
This is not elegant. It is just the fastest public workaround.
A second issue you are likely to hit right after this one
Even if you get past the PEFT crash, Gemma 4 currently has another training friction point: the open Transformers issue says text-only fine-tuning still requires both token_type_ids and mm_token_type_ids, and that adding zero tensors for both works. That issue also says that for TRL SFT, you typically need a custom collator and remove_unused_columns=False. (GitHub)
Minimal pattern:
def add_type_ids(batch):
zeros = torch.zeros_like(batch["input_ids"])
batch["token_type_ids"] = zeros
batch["mm_token_type_ids"] = zeros
return batch
And if you use TRL:
training_args.remove_unused_columns = False
This is a separate bug from the PEFT problem, but it is highly relevant because many people will hit it immediately after fixing the LoRA injection error. (GitHub)
Version guidance
Right now the version picture is:
- PEFT 0.18.1 is the latest release on PyPI. (PyPI)
- PEFT
main already says 0.18.2.dev0, but that is not the same as a released fix. (GitHub)
- Transformers 5.5.0 is the latest PyPI release, published on April 2, 2026. (PyPI)
So if you are on peft==0.18.1, you are not missing a newer public PEFT release.
What I would do in your exact situation
If your data is text-only
I would do this, in order:
- Upgrade to
transformers==5.5.0
- Keep
peft==0.18.1
- Stop using
all-linear
- Build exact full target names for text-only linear modules
- Verify them with
print_trainable_parameters() and targeted_module_names
- Add zero
token_type_ids and mm_token_type_ids in the collator (PyPI)
If you need multimodal LoRA now
Use the monkey-patch, then add the type-id workaround, then proceed. For multimodal fine-tuning specifically, Hugging Face’s Gemma 4 launch post says AutoModelForMultimodalLM is the lower-level class especially useful for fine-tuning, and the built-in chat template should be used to avoid prompt-formatting mistakes. Google’s official Gemma guides also provide separate text and vision QLoRA tutorials. (Hugging Face)
Bottom line
Your crash is caused by a real PEFT incompatibility with Gemma 4’s Gemma4ClippableLinear modules. There is no public ETA for a packaged fix. The best current workaround is:
- text-only training: restrict LoRA to exact text decoder
nn.Linear module names
- multimodal training: monkey-patch
Gemma4ClippableLinear before loading the model
- in both cases, be ready to also add
token_type_ids and mm_token_type_ids during training (GitHub)