PEFT documentation
LoRA
LoRA
Low-Rank Adaptation (LoRA) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the attention layers. This drastically reduces the number of parameters that need to be fine-tuned.
The abstract from the paper is:
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6..
LoraConfig
class peft.LoraConfig
< source >( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False r: int = 8 target_modules: Optional[Union[list[str], str]] = None exclude_modules: Optional[Union[list[str], str]] = None lora_alpha: int = 8 lora_dropout: float = 0.0 fan_in_fan_out: bool = False bias: Literal['none', 'all', 'lora_only'] = 'none' use_rslora: bool = False modules_to_save: Optional[list[str]] = None init_lora_weights: bool | Literal['gaussian', 'eva', 'olora', 'pissa', 'pissa_niter_[number of iters]', 'corda', 'loftq', 'orthogonal'] = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None rank_pattern: Optional[dict] = <factory> alpha_pattern: Optional[dict] = <factory> megatron_config: Optional[dict] = None megatron_core: Optional[str] = 'megatron.core' trainable_token_indices: Optional[Union[list[int], dict[str, list[int]]]] = None loftq_config: Union[LoftQConfig, dict] = <factory> eva_config: Optional[EvaConfig] = None corda_config: Optional[CordaConfig] = None use_dora: bool = False use_qalora: bool = False qalora_group_size: int = 16 layer_replication: Optional[list[tuple[int, int]]] = None runtime_config: LoraRuntimeConfig = <factory> lora_bias: bool = False target_parameters: Optional[list[str]] = None )
Parameters
-  r (int) — Lora attention dimension (the “rank”).
-  target_modules (Optional[Union[List[str], str]]) — The names of the modules to apply the adapter to. If this is specified, only the modules with the specified names will be replaced. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. If this is specified as ‘all-linear’, then all linear/Conv1D modules are chosen (if the model is a PreTrainedModel, the output layer excluded). If this is not specified, modules will be chosen according to the model architecture. If the architecture is not known, an error will be raised — in this case, you should specify the target modules manually. To avoid targeting any modules (because you want to applytarget_parameters), settarget_modules=[].
-  exclude_modules (Optional[Union[List[str], str]]) — The names of the modules to not apply the adapter. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings.
-  lora_alpha (int) — The alpha parameter for Lora scaling.
-  lora_dropout (float) — The dropout probability for Lora layers.
-  fan_in_fan_out (bool) — Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 usesConv1Dwhich stores weights like (fan_in, fan_out) and hence this should be set toTrue.
-  bias (str) — Bias type for LoRA. Can be ‘none’, ‘all’ or ‘lora_only’. If ‘all’ or ‘lora_only’, the corresponding biases will be updated during training. Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.
-  use_rslora (bool) — When set to True, uses Rank-Stabilized LoRA which sets the adapter scaling factor tolora_alpha/math.sqrt(r), since it was proven to work better. Otherwise, it will use the original default value oflora_alpha/r.
-  modules_to_save (List[str]) — List of modules apart from adapter layers to be set as trainable and saved in the final checkpoint.
-  init_lora_weights (bool|Literal["gaussian", "eva", "olora", "pissa", "pissa_niter_[number of iters]", "corda", "loftq", "orthogonal"]) — How to initialize the weights of the adapter layers. Passing True (default) results in the default initialization from the reference implementation from Microsoft, with the LoRA B weight being set to 0. This means that without further training, the LoRA adapter will be a no-op. Setting the initialization to False leads to random initialization of LoRA A and B, meaning that LoRA is not a no-op before training; this setting is intended for debugging purposes. Passing ‘gaussian’ results in Gaussian initialization scaled by the LoRA rank for linear and layers. Pass'loftq'to use LoftQ initialization. Passing'eva'results in a data-driven initialization of Explained Variance Adaptation. EVA initializes LoRA based on the SVD of layer input activations and achieves SOTA performance due to its ability to adapt to the finetuning data. Pass'olora'to use OLoRA initialization. Passing'pissa'results in the initialization of https://huggingface.co/papers/2404.02948’Principal Singular values and Singular vectors Adaptation (PiSSA), which converges more rapidly than LoRA and ultimately achieves superior performance. Moreover, PiSSA reduces the quantization error compared to QLoRA, leading to further enhancements. Passing 'pissa_niter_[number of iters]'initiates Fast-SVD-based PiSSA initialization, where[number of iters]indicates the number of subspace iterations to perform FSVD, and must be a nonnegative integer. When[number of iters]is set to 16, it can complete the initialization of a 7B model within seconds, and the training effect is approximately equivalent to using SVD. Passing'corda'results in the initialization of Context-Oriented Decomposition Adaptation, which converges even more rapidly than PiSSA in Instruction-Previewed Mode, and preserves world knowledge better than LoRA in Knowledge-Preserved Mode. Passing"orthogonal"results in LoRA A and B being intialized orthogonally; in this, it resembles"olora", but the base weights are left untouched (requiresrto be even, only supported for linear layers for now).
-  layers_to_transform (Union[List[int], int]) — The layer indices to transform. If a list of ints is passed, it will apply the adapter to the layer indices that are specified in this list. If a single integer is passed, it will apply the transformations on the layer at this index.
-  layers_pattern (Optional[Union[List[str], str]]) — The layer pattern name, used only iflayers_to_transformis different fromNone. This should target thenn.ModuleListof the model, which is often called'layers'or'h'.
-  rank_pattern (dict) — The mapping from layer names or regexp expression to ranks which are different from the default rank specified byr. For example,{'^model.decoder.layers.0.encoder_attn.k_proj': 16}.
-  alpha_pattern (dict) — The mapping from layer names or regexp expression to alphas which are different from the default alpha specified bylora_alpha. For example,{'^model.decoder.layers.0.encoder_attn.k_proj': 16}.
-  megatron_config (Optional[dict]) — The TransformerConfig arguments for Megatron. It is used to create LoRA’s parallel linear layer. You can get it like this,core_transformer_config_from_args(get_args()), these two functions being from Megatron. The arguments will be used to initialize the TransformerConfig of Megatron. You need to specify this parameter when you want to apply LoRA to the ColumnParallelLinear and RowParallelLinear layers of megatron.
-  megatron_core (Optional[str]) — The core module from Megatron to use, defaults to"megatron.core".
-  trainable_token_indices (Optional[Union[List[int], dict[str, List[int]]]]) — Lets you specify which token indices to selectively fine-tune without requiring to re-train the whole embedding matrix using thepeft.TrainableTokensModelmethod. You can specify token indices in two ways. Either you specify a list of indices which will then target the model’s input embedding layer (or, if not found,embed_tokens). Alternatively, you can specify a dictionary where the key is the name of the embedding module and the values are the list of token indices, e.g.{'embed_tokens': [0, 1, ...]}. Note that training with FSDP requiresuse_orig_params=Trueto avoid issues with non-uniformrequires_grad.
-  loftq_config (Optional[LoftQConfig]) — The configuration of LoftQ. If this is not None, then LoftQ will be used to quantize the backbone weights and initialize Lora layers. Also passinit_lora_weights='loftq'. Note that you should not pass a quantized model in this case, as LoftQ will quantize the model itself.
-  eva_config (Optional[EvaConfig]) — The configuration of EVA. At a minimum the dataset argument needs to be set (use the same dataset as for finetuning).
-  corda_config (Optional[CordaConfig]) — The configuration of CorDA. If this is not None, then CorDA will be used to build the adapter layers. Also passinit_lora_weights='corda'.
-  use_dora (bool) — Enable ‘Weight-Decomposed Low-Rank Adaptation’ (DoRA). This technique decomposes the updates of the weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the magnitude is handled by a separate learnable parameter. This can improve the performance of LoRA especially at low ranks. Right now, DoRA only supports linear and Conv2D layers. DoRA introduces a bigger overhead than pure LoRA, so it is recommended to merge weights for inference. For more information, see https://huggingface.co/papers/2402.09353.
-  layer_replication (List[Tuple[int, int]]) — Build a new stack of layers by stacking the original model layers according to the ranges specified. This allows expanding (or shrinking) the model without duplicating the base model weights. The new layers will all have separate LoRA adapters attached to them.
-  runtime_config (LoraRuntimeConfig) — Runtime configurations (which are not saved or restored).
-  lora_bias (bool) — Defaults toFalse. Whether to enable the bias term for the LoRA B parameter. Typically, this should be disabled. The main use case for this is when the LoRA weights were extracted from fully fine-tuned parameters so the bias of those parameters can be taken into account.
-  target_parameters (List[str], optional) — List of parameter names or regex expression of the parameter names to replace with LoRA. This argument behaves similarly totarget_modules, except that the parameter name should be passed. Generally, you should usetarget_modulesto target the module (e.g.nn.Linear). However, in some circumstances, this is not possible. E.g., in many mixture of expert (MoE) layers in HF Transformers, instead of usingnn.Linear, annn.Parameteris used. PEFT normally overwrites theforwardmethod for LoRA, but fornn.Parameter, there is none. Therefore, to apply LoRA to that parameter, it needs to be targeted withtarget_parameters. As an example, for Llama4, you can pass:target_parameters=['feed_forward.experts.gate_up_proj', 'feed_forward.experts.down_proj]. Passing a string for regex matching is not implemented yet.
This is the configuration class to store the configuration of a LoraModel.
Returns the configuration for your adapter model as a dictionary. Removes runtime configurations.
LoraModel
class peft.LoraModel
< source >( model peft_config: Union[PeftConfig, dict[str, PeftConfig]] adapter_name: str low_cpu_mem_usage: bool = False state_dict: Optional[dict[str, torch.Tensor]] = None  ) → torch.nn.Module
Parameters
-  model (torch.nn.Module) — The model to be adapted.
- config (LoraConfig) — The configuration of the Lora model.
-  adapter_name (str) — The name of the adapter, defaults to"default".
-  low_cpu_mem_usage (bool,optional, defaults toFalse) — Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns
torch.nn.Module
The Lora model.
Creates Low Rank Adapter (LoRA) model from a pretrained transformers model.
The method is described in detail in https://huggingface.co/papers/2106.09685.
Example:
>>> from transformers import AutoModelForSeq2SeqLM
>>> from peft import LoraModel, LoraConfig
>>> config = LoraConfig(
...     task_type="SEQ_2_SEQ_LM",
...     r=8,
...     lora_alpha=32,
...     target_modules=["q", "v"],
...     lora_dropout=0.01,
... )
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> lora_model = LoraModel(model, config, "default")>>> import torch
>>> import transformers
>>> from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
>>> rank = ...
>>> target_modules = ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out", "wte"]
>>> config = LoraConfig(
...     r=4, lora_alpha=16, target_modules=target_modules, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
... )
>>> quantization_config = transformers.BitsAndBytesConfig(load_in_8bit=True)
>>> tokenizer = transformers.AutoTokenizer.from_pretrained(
...     "kakaobrain/kogpt",
...     revision="KoGPT6B-ryan1.5b-float16",  # or float32 version: revision=KoGPT6B-ryan1.5b
...     bos_token="[BOS]",
...     eos_token="[EOS]",
...     unk_token="[UNK]",
...     pad_token="[PAD]",
...     mask_token="[MASK]",
... )
>>> model = transformers.GPTJForCausalLM.from_pretrained(
...     "kakaobrain/kogpt",
...     revision="KoGPT6B-ryan1.5b-float16",  # or float32 version: revision=KoGPT6B-ryan1.5b
...     pad_token_id=tokenizer.eos_token_id,
...     use_cache=False,
...     device_map={"": rank},
...     torch_dtype=torch.float16,
...     quantization_config=quantization_config,
... )
>>> model = prepare_model_for_kbit_training(model)
>>> lora_model = get_peft_model(model, config)Attributes:
- model (PreTrainedModel) — The model to be adapted.
- peft_config (LoraConfig): The configuration of the Lora model.
add_weighted_adapter
< source >( adapters: list[str] weights: list[float] adapter_name: str combination_type: str = 'svd' svd_rank: int | None = None svd_clamp: int | None = None svd_full_matrices: bool = True svd_driver: str | None = None density: float | None = None majority_sign_method: Literal['total', 'frequency'] = 'total' )
Parameters
-  adapters (list) — List of adapter names to be merged.
-  weights (list) — List of weights for each adapter.
-  adapter_name (str) — Name of the new adapter.
-  combination_type (str) — The merging type can be one of [svd,linear,cat,ties,ties_svd,dare_ties,dare_linear,dare_ties_svd,dare_linear_svd,magnitude_prune,magnitude_prune_svd]. When using thecatcombination_type, the rank of the resulting adapter is equal to the sum of all adapters ranks (the mixed adapter may be too big and result in OOM errors).
-  svd_rank (int, optional) — Rank of output adapter for svd. If None provided, will use max rank of merging adapters.
-  svd_clamp (float, optional) — A quantile threshold for clamping SVD decomposition output. If None is provided, do not perform clamping. Defaults to None.
-  svd_full_matrices (bool, optional) — Controls whether to compute the full or reduced SVD, and consequently, the shape of the returned tensors U and Vh. Defaults to True.
-  svd_driver (str, optional) — Name of the cuSOLVER method to be used. This keyword argument only works when merging on CUDA. Can be one of [None,gesvd,gesvdj,gesvda]. For more info please refer totorch.linalg.svddocumentation. Defaults to None.
-  density (float, optional) — Value between 0 and 1. 0 means all values are pruned and 1 means no values are pruned. Should be used with [ties,ties_svd,dare_ties,dare_linear,dare_ties_svd,dare_linear_svd,magnintude_prune,magnitude_prune_svd]
-  majority_sign_method (str) — The method, should be one of [“total”, “frequency”], to use to get the magnitude of the sign values. Should be used with [ties,ties_svd,dare_ties,dare_ties_svd]
This method adds a new adapter by merging the given adapters with the given weights.
When using the cat combination_type you should be aware that rank of the resulting adapter will be equal to
the sum of all adapters ranks. So it’s possible that the mixed adapter may become too big and result in OOM
errors.
delete_adapter
< source >( adapter_name: str )
Deletes an existing adapter.
Disable all adapters.
When disabling all adapters, the model output corresponds to the output of the base model.
Enable all adapters.
Call this if you have previously disabled all adapters and want to re-enable them.
merge_and_unload
< source >( progressbar: bool = False safe_merge: bool = False adapter_names: Optional[list[str]] = None )
Parameters
-  progressbar (bool) — whether to show a progressbar indicating the unload and merge process
-  safe_merge (bool) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights
-  adapter_names (List[str], optional) — The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults toNone.
This method merges the LoRa layers into the base model. This is needed if someone wants to use the base model as a standalone model.
Example:
>>> from transformers import AutoModelForCausalLM
>>> from peft import PeftModel
>>> base_model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b")
>>> peft_model_id = "smangrul/falcon-40B-int4-peft-lora-sfttrainer-sample"
>>> model = PeftModel.from_pretrained(base_model, peft_model_id)
>>> merged_model = model.merge_and_unload()set_adapter
< source >( adapter_name: str | list[str] )
Set the active adapter(s).
Additionally, this function will set the specified adapters to trainable (i.e., requires_grad=True). If this is not desired, use the following code.
subtract_mutated_init
< source >( output_state_dict: dict[str, torch.Tensor] adapter_name: str kwargs = None )
This function can calculate the updates of the PiSSA/CorDA/OLoRA by comparing the parameters of the
PiSSA/CorDA/OLoRA adapter in output_state_dict with the initial values of PiSSA/CorDA/OLoRA in
adapter_name, thus converting PiSSA/CorDA/OLoRA to LoRA.
Gets back the base model by removing all the lora modules without merging. This gives back the original base model.
Utility
LoftQ
peft.replace_lora_weights_loftq
< source >( peft_model model_path: Optional[str] = None adapter_name: str = 'default' callback: Optional[Callable[[torch.nn.Module, str], bool]] = None )
Parameters
-  peft_model (PeftModel) — The model to replace the weights of. Must be a quantized PEFT model with LoRA layers.
-  model_path (Optional[str]) — The path to the model safetensors file. If the model is a Hugging Face model, this will be inferred from the model’s config. Otherwise, it must be provided.
-  adapter_name (str) — The name of the adapter to replace the weights of. The default adapter name is “default”.
-  callback (Optional[Callable[[PeftModel, str], bool]]) — A callback function that will be called after each module is replaced. The callback function should take the model and the name of the current module as input and return a boolean indicating whether the replacement should be kept. If the callback returns False, the replacement will be rolled back. This can be very useful to confirm that the LoftQ initialization actually decreases the quantization error of the model. As an example, this callback could generate logits for given input and compare it with the logits from the original, non-quanitzed model with the same input, and only returnTrueif there is an improvement. As this is a greedy optimization, it’s possible that calling this function multiple times yields incremental improvements.
Replace the LoRA weights of a model quantized with bitsandbytes, using the LoftQ technique.
The replacement is done on the fly by loading in the non-quantized weights from a locally stored safetensors model file and initializing the LoRA weights such that the quantization error between the original and quantized weights is minimized.
As lazy loading is not possible with pickle, normal PyTorch checkpoint files cannot be supported.
Depending on the model size, calling this function may take some time to finish.
Eva
EvaConfig
class peft.EvaConfig
< source >( rho: float = 2.0 tau: float = 0.99 use_label_mask: bool = True label_mask_value: int = -100 whiten: bool = False adjust_scaling_factors: bool = True )
Parameters
-  rho (float) — Rho value for EVA redistribution (>= 1.0). The maximum rank for a layer is lora_r * rho. Default is 2.0, meaning the maximum rank allowed for a layer is 2r. Increasing rho will allow for a higher degree of redistribution of ranks across layers. Some pre-trained models might be more sensitive to a rank redistribution. It can therefore be beneficial to try rho=1.0 (no redistribution) if the performance is lower than expected.
-  tau (float) — Cosine similarity threshold for early stopping. Compares the cosine similarity of right-singular vectors between two consecutive SVD steps. If the cosine similarity is above this threshold, the SVD iteration is stopped. Default is 0.99.
-  use_label_mask (bool) — Use label mask for EVA initialization. This means that positions where labels=label_mask_value are ignored for the SVD computation. Setting use_label_mask=True is preferred in most cases and can be especially beneficial for multi-turn conversations. The default value is True. Filtering out items based on the label mask can sometimes lead to a small batch size and as a result instabilities in the SVD computation. For cases where a large share of batch items would be filtered out, set use_label_mask=False.
-  label_mask_value (int) — If use_label_mask=True the value to look for to mask out ignored tokens. Default is -100.
-  whiten (bool) — Apply whitening to singular vectors. Default is False. Whitening has been shown to be beneficial for EVA in the vision domain.
-  adjust_scaling_factors (bool) — Adjust LoRA scaling factors after the rank redistribution. Setting this to True means the scaling factors are adjusted so that all LoRA gradients have the same scale regardless of their rank. Default is True.
This is the sub-configuration class to store the configuration for a data-driven initialization via EVA. EVA was introduced in Explained Variance Adaptation.
initialize_lora_eva_weights
peft.initialize_lora_eva_weights
< source >( model: Module dataloader: typing.Optional[collections.abc.Iterable] = None eva_state_dict: typing.Optional[dict] = None forward_fn: typing.Optional[<built-in function callable>] = <function forward_fn_dict at 0x7f538e0951b0> prepare_model_inputs_fn: typing.Optional[<built-in function callable>] = <function prepare_model_inputs_fn_language_modeling at 0x7f538e095090> prepare_layer_inputs_fn: typing.Union[<built-in function callable>, dict[str, callable], NoneType] = <function prepare_layer_inputs_fn_language_modeling at 0x7f538e095120> adapter_name: str = 'default' gather_distributed_inputs: bool = True show_progress_bar: bool = True ) → model (torch.nn.Module)
Parameters
- model (PeftModel) — The peft model to compute the SVD for.
- dataloader (Optional[Iterable]) — The dataloader to use for the forward pass. If None, eva_state_dict needs to be provided.
-  eva_state_dict (Optional[dict]) —
The state_dict to load into the model. If None, a dataloader needs to be provided and the state_dict will
be computed using get_eva_state_dict.
-  forward_fn (callable) —
The forward function to use for the forward pass. Takes two arguments: modelandinputs. Default behavior isreturn model(**inputs)
-  prepare_model_inputs_fn (Optional[callable]) —
This function receives the model inputs and the peft_config and passes the output to
prepare_layer_inputs_fn. Can be used to modify the input to the SVD computation based on the original model inputs. For example for language modeling the attention mask is used to determine which indices are padding tokens and should not be used for SVD. Any function defined here expects two arguments:model_inputandpeft_config.peft.tuners.lora.eva.prepare_model_inputs_fn_language_modelingis used by default.
-  prepare_layer_inputs_fn (Union[callable, Dict[str, callable], None]) —
This function receives the layer inputs, the model inputs (potentially modified by
prepare_model_inputs_fn) and the name of the layer and returns the inputs that should be used for SVD for that particular layer. Any custom function defined here expects three arguments:layer_input,model_input, andlayer_nameand should return a 2d tensor. The default logic can be found in peft.tuners.lora.eva.prepare_layer_inputs_fn_language_modeling and works for language modeling. In this case model_inputs is the mask used to determine which indices should be used for SVD (created byprepare_model_inputs_fn_language_modeling).
- adapter_name (str) — The name of the adapter to initialize the weights for.
- gather_distributed_inputs (bool) — Whether to gather the layer inputs from all ranks. Default is True meaning in a distributed setting the layer inputs will be gathered from all ranks for the SVD computation. For non-distributed settings this argument is ignored. Set to False if you are using a non-distributed dataloader in a distributed setting.
- show_progress_bar (bool) — Whether to show a progress bar. Default is True.
Returns
model (torch.nn.Module)
The model with the initialized LoRA weights.
Initialize the weights of the LoRA layers using the EVA method.
This function initializes the weights of the LoRA layers using the EVA method. It computes the SVD for each adapter layer and updates the weights accordingly.
get_eva_state_dict
peft.get_eva_state_dict
< source >( model: Module dataloader: Iterable peft_config: typing.Optional[peft.tuners.lora.config.LoraConfig] = None forward_fn: typing.Optional[<built-in function callable>] = <function forward_fn_dict at 0x7f538e0951b0> prepare_model_inputs_fn: typing.Optional[<built-in function callable>] = <function prepare_model_inputs_fn_language_modeling at 0x7f538e095090> prepare_layer_inputs_fn: typing.Union[<built-in function callable>, dict[str, callable], NoneType] = <function prepare_layer_inputs_fn_language_modeling at 0x7f538e095120> adapter_name: str = 'default' gather_distributed_inputs: bool = True show_progress_bar: bool = True ) → eva_state_dict (dict)
Parameters
- model (torch.nn.Module) — The model to compute the SVD for. Does not need to be a PeftModel.
- dataloader (Iterable) — The dataloader to use for the forward pass.
-  peft_config (Optional[LoraConfig]) —
The configuration for the LoRA layers. Only required if modelis not a PeftModel.
-  forward_fn (callable) —
The forward function to use for the forward pass. Takes two arguments: modelandinputs. Default behavior isreturn model(**inputs)
-  prepare_model_inputs_fn (Optional[callable]) —
This function receives the model inputs and the peft_config and passes the output to
prepare_layer_inputs_fn. Can be used to modify the input to the SVD computation based on the original model inputs. For example for language modeling the attention mask is used to determine which indices are padding tokens and should not be used for SVD. Any function defined here expects two arguments:model_inputandpeft_config.peft.tuners.lora.eva.prepare_model_inputs_fn_language_modelingis used by default.
-  prepare_layer_inputs_fn (Union[callable, Dict[str, callable], None]) —
This function receives the layer inputs, the model inputs (potentially modified by
prepare_model_inputs_fn) and the name of the layer and returns the inputs that should be used for SVD for that particular layer. Any custom function defined here expects three arguments:layer_input,model_input, andlayer_nameand should return a 2d tensor. The default logic can be found in peft.tuners.lora.eva.prepare_layer_inputs_fn_language_modeling and works for language modeling. In this case model_inputs is the mask used to determine which indices should be used for SVD (created byprepare_model_inputs_fn_language_modeling).
- adapter_name (str) — The name of the adapter to compute the SVD for.
- gather_distributed_inputs (bool) — Whether to gather the layer inputs from all ranks. Default is True meaning in a distributed setting the layer inputs will be gathered from all ranks for the SVD computation. For non-distributed settings this argument is ignored. Set to False if you are using a non-distributed dataloader in a distributed setting.
- show_progress_bar (bool) — Whether to show a progress bar. Default is True.
Returns
eva_state_dict (dict)
The state dictionary containing the SVD components for each layer.
Compute the SVD for each layer in the model.
This function computes the Singular Value Decomposition (SVD) for each layer in the model. It uses the incremental PCA method to compute the SVD components. The function also checks for convergence of the computed components using cosine similarity. The rank distribution for each layer is determined based on the explained variance ratio.