Peft 0.18.1 crashing when fine-tuning

Hi, peft Version: 0.18.1 is crashing when attempting to fine-tune google/gemma-4-E2B. The error msg is shown below. I checked and 0.18.1 is the latest version. Will there be an update soon or is there a workaround? I’d appreciate any help. thanks!

ValueError: Target module Gemma4ClippableLinear(
(linear): Linear(in_features=768, out_features=768, bias=False)
) is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv1d`, `torch.nn.Conv2d`, `torch.nn.Conv3d`, `transformers.pytorch_utils.Conv1D`, `torch.nn.MultiheadAttention.`.

1 Like

Gemma 4 is so new—or rather, it’s still in its infancy—that I think it’ll take a little time before it works perfectly across the entire HF ecosystem…

But it looks like it’ll barely work out somehow :


This is a known upstream compatibility bug, not a one-off mistake in your setup. There is a workaround. There is no public release ETA yet. As of now, PyPI still lists PEFT 0.18.1 as the latest release, while the PEFT main branch has already moved to 0.18.2.dev0. The Gemma 4 support issue is open and currently shows no assignee, no milestone, and no linked PR, so there is no evidence of an imminent packaged fix yet. (PyPI)

What is happening

Your error:

ValueError: Target module Gemma4ClippableLinear(...) is not supported

is happening because PEFT LoRA only knows how to inject adapters into a specific set of module types such as torch.nn.Linear, Embedding, Conv*, Conv1D, and MultiheadAttention. The open PEFT Gemma 4 issue explains that Gemma4ClippableLinear is a wrapper nn.Module, not a subclass of nn.Linear, so PEFT rejects it during LoRA injection. The same issue notes that this happens before exclude_modules can protect you. (GitHub)

Why Gemma 4 E2B triggers this

Gemma 4 E2B is not just a plain text decoder. The official model card says Gemma 4 is multimodal, with text and image input across the family, and audio support on the small E2B and E4B variants. The Hugging Face Gemma 4 launch post says the same thing and describes E2B/E4B as the small variants with audio support. That matters because the PEFT issue identifies Gemma4ClippableLinear as being used in the vision/audio encoder. (Hugging Face)

So the practical root cause is usually:

  • you are using broad LoRA targeting, and
  • PEFT walks into Gemma 4’s multimodal towers,
  • then it hits Gemma4ClippableLinear,
  • then it crashes. (Hugging Face)

Why this often surprises people

PEFT’s own LoRA docs recommend target_modules="all-linear" for QLoRA-style training. That works well on many standard text-only transformer models. But PEFT also documents that target_modules behaves broadly:

  • if you pass a string, it uses regex matching,
  • if you pass a list of strings, it uses exact match or suffix match,
  • and if you pass "all-linear", it targets all linear/Conv1D modules. (Hugging Face)

On a new multimodal model like Gemma 4 E2B, that convenience becomes a hazard. A setting that is fine for Llama-style text-only training can unintentionally reach vision or audio blocks.

Will there be an update soon

There will likely be an update eventually, but there is no public timeline I can justify from the current public state. The issue is open. It has no assignee, no milestone, and no branches or pull requests attached. That means “soon” would be speculation. (GitHub)

The best workaround depends on what you are actually fine-tuning

Case 1. You are doing text-only fine-tuning

This is the cleanest path, and probably the best one for most users fine-tuning E2B on text data.

In this case, the best fix is usually:

  1. Do not use target_modules="all-linear"
  2. Do not use loose suffix-only target lists like ["q_proj", "v_proj", "o_proj"]
  3. Build an explicit list of full module names from the text backbone only
  4. Skip anything under vision or audio towers
  5. Keep only actual nn.Linear layers in the text path

That works because PEFT will then never touch the unsupported multimodal wrapper modules. This approach is more stable than monkey-patching if your task is text-only. The PEFT docs confirm how target matching works, and the custom-models docs recommend inspecting which modules were actually adapted. (Hugging Face)

A discovery pattern like this is the safest starting point:

import torch.nn as nn

TEXT_SUFFIXES = (
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj",
)

def discover_text_targets(model):
    out = []
    for name, module in model.named_modules():
        lname = name.lower()

        # Skip multimodal branches
        if "vision" in lname or "audio" in lname:
            continue

        # Keep only typical text LoRA targets
        if not name.endswith(TEXT_SUFFIXES):
            continue

        # Only ordinary linear layers
        if isinstance(module, nn.Linear):
            out.append(name)

    return out

target_modules = discover_text_targets(model)
print("\n".join(target_modules[:50]))
print(f"Found {len(target_modules)} targets")

Then:

from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=target_modules,
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

# Useful sanity check
print(model.targeted_module_names)

That last check matters. PEFT’s docs explicitly recommend checking trainable parameters and the targeted modules to confirm you only adapted the intended layers. (Hugging Face)

Case 2. You are doing actual multimodal fine-tuning

If you really want to fine-tune image or audio paths too, then the current practical workaround is the one documented in the open PEFT issue: monkey-patch Gemma4ClippableLinear before loading the model so PEFT sees a supported linear-like class. The issue includes this workaround and reports that QLoRA then proceeds normally. (GitHub)

Use this before from_pretrained():

import torch
import torch.nn as nn
from transformers.models.gemma4 import modeling_gemma4

class PatchedClippableLinear(nn.Linear):
    def __init__(self, config, in_features, out_features):
        nn.Linear.__init__(self, in_features, out_features, bias=False)
        self.use_clipped_linears = getattr(config, "use_clipped_linears", False)

        if self.use_clipped_linears:
            self.register_buffer("input_min", torch.tensor(-float("inf")))
            self.register_buffer("input_max", torch.tensor(float("inf")))
            self.register_buffer("output_min", torch.tensor(-float("inf")))
            self.register_buffer("output_max", torch.tensor(float("inf")))

    def forward(self, x):
        if self.use_clipped_linears:
            x = torch.clamp(x, self.input_min, self.input_max)

        out = nn.Linear.forward(self, x)

        if self.use_clipped_linears:
            out = torch.clamp(out, self.output_min, self.output_max)

        return out

modeling_gemma4.Gemma4ClippableLinear = PatchedClippableLinear

This is not elegant. It is just the fastest public workaround.

A second issue you are likely to hit right after this one

Even if you get past the PEFT crash, Gemma 4 currently has another training friction point: the open Transformers issue says text-only fine-tuning still requires both token_type_ids and mm_token_type_ids, and that adding zero tensors for both works. That issue also says that for TRL SFT, you typically need a custom collator and remove_unused_columns=False. (GitHub)

Minimal pattern:

def add_type_ids(batch):
    zeros = torch.zeros_like(batch["input_ids"])
    batch["token_type_ids"] = zeros
    batch["mm_token_type_ids"] = zeros
    return batch

And if you use TRL:

training_args.remove_unused_columns = False

This is a separate bug from the PEFT problem, but it is highly relevant because many people will hit it immediately after fixing the LoRA injection error. (GitHub)

Version guidance

Right now the version picture is:

  • PEFT 0.18.1 is the latest release on PyPI. (PyPI)
  • PEFT main already says 0.18.2.dev0, but that is not the same as a released fix. (GitHub)
  • Transformers 5.5.0 is the latest PyPI release, published on April 2, 2026. (PyPI)

So if you are on peft==0.18.1, you are not missing a newer public PEFT release.

What I would do in your exact situation

If your data is text-only

I would do this, in order:

  1. Upgrade to transformers==5.5.0
  2. Keep peft==0.18.1
  3. Stop using all-linear
  4. Build exact full target names for text-only linear modules
  5. Verify them with print_trainable_parameters() and targeted_module_names
  6. Add zero token_type_ids and mm_token_type_ids in the collator (PyPI)

If you need multimodal LoRA now

Use the monkey-patch, then add the type-id workaround, then proceed. For multimodal fine-tuning specifically, Hugging Face’s Gemma 4 launch post says AutoModelForMultimodalLM is the lower-level class especially useful for fine-tuning, and the built-in chat template should be used to avoid prompt-formatting mistakes. Google’s official Gemma guides also provide separate text and vision QLoRA tutorials. (Hugging Face)

Bottom line

Your crash is caused by a real PEFT incompatibility with Gemma 4’s Gemma4ClippableLinear modules. There is no public ETA for a packaged fix. The best current workaround is:

  • text-only training: restrict LoRA to exact text decoder nn.Linear module names
  • multimodal training: monkey-patch Gemma4ClippableLinear before loading the model
  • in both cases, be ready to also add token_type_ids and mm_token_type_ids during training (GitHub)

This is a brilliant explanation! Thank you so much @John6666 for your kind support and taking the time to reply. I really appreciate it! The issue is very clear to me now. I’m doing text-only fine-tuning and will follow the solution you recommended. Thank you so much!

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Note that the simplest solution is to correctly define the target_modules, see this PEFT discussion.

2 Likes