Transformers documentation

Auto Classes

You are viewing v4.56.2 version. A newer version v4.57.1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Auto Classes

多くの場合、from_pretrained()メソッドに与えられた事前学習済みモデルの名前やパスから、使用したいアーキテクチャを推測することができます。自動クラスはこの仕事をあなたに代わって行うためにここにありますので、事前学習済みの重み/設定/語彙への名前/パスを与えると自動的に関連するモデルを取得できます。

AutoConfigAutoModelAutoTokenizerのいずれかをインスタンス化すると、関連するアーキテクチャのクラスが直接作成されます。例えば、

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

これはBertModelのインスタンスであるモデルを作成します。

各タスクごと、そして各バックエンド(PyTorch、TensorFlow、またはFlax)ごとにAutoModelのクラスが存在します。

自動クラスの拡張

それぞれの自動クラスには、カスタムクラスで拡張するためのメソッドがあります。例えば、NewModelというモデルのカスタムクラスを定義した場合、NewModelConfigを確保しておけばこのようにして自動クラスに追加することができます:

from transformers import AutoConfig, AutoModel

AutoConfig.register("new-model", NewModelConfig)
AutoModel.register(NewModelConfig, NewModel)

その後、通常どおりauto classesを使用することができるようになります!

あなたのNewModelConfigPretrainedConfigのサブクラスである場合、そのmodel_type属性がコンフィグを登録するときに使用するキー(ここでは"new-model")と同じに設定されていることを確認してください。

同様に、あなたのNewModelPreTrainedModelのサブクラスである場合、そのconfig_class属性がモデルを登録する際に使用するクラス(ここではNewModelConfig)と同じに設定されていることを確認してください。

AutoConfig

class transformers.AutoConfig

< >

( )

This is a generic configuration class that will be instantiated as one of the configuration classes of the library when created with the from_pretrained() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_pretrained

< >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike[str]] **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model configuration hosted inside a model repo on huggingface.co.
    • A path to a directory containing a configuration file saved using the save_pretrained() method, or the save_pretrained() method, e.g., ./my_model_directory/.
    • A path or url to a saved configuration JSON file, e.g., ./my_model_directory/configuration.json.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download the model weights and configuration files and override the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • return_unused_kwargs (bool, optional, defaults to False) — If False, then this function returns just the final configuration object.

    If True, then this functions returns a Tuple(config, unused_kwargs) where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the part of kwargs which has not been used to update config and is otherwise ignored.

  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • kwargs(additional keyword arguments, optional) — The values in kwargs of any keys which are configuration attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not configuration attributes is controlled by the return_unused_kwargs keyword parameter.

Instantiate one of the configuration classes of the library from a pretrained model configuration.

The configuration class to instantiate is selected based on the model_type property of the config object that is loaded, or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • aimv2Aimv2Config (AIMv2 model)
  • aimv2_vision_modelAimv2VisionConfig (Aimv2VisionModel model)
  • albertAlbertConfig (ALBERT model)
  • alignAlignConfig (ALIGN model)
  • altclipAltCLIPConfig (AltCLIP model)
  • apertusApertusConfig (Apertus model)
  • arceeArceeConfig (Arcee model)
  • ariaAriaConfig (Aria model)
  • aria_textAriaTextConfig (AriaText model)
  • audio-spectrogram-transformerASTConfig (Audio Spectrogram Transformer model)
  • autoformerAutoformerConfig (Autoformer model)
  • aya_visionAyaVisionConfig (AyaVision model)
  • bambaBambaConfig (Bamba model)
  • barkBarkConfig (Bark model)
  • bartBartConfig (BART model)
  • beitBeitConfig (BEiT model)
  • bertBertConfig (BERT model)
  • bert-generationBertGenerationConfig (Bert Generation model)
  • big_birdBigBirdConfig (BigBird model)
  • bigbird_pegasusBigBirdPegasusConfig (BigBird-Pegasus model)
  • biogptBioGptConfig (BioGpt model)
  • bitBitConfig (BiT model)
  • bitnetBitNetConfig (BitNet model)
  • blenderbotBlenderbotConfig (Blenderbot model)
  • blenderbot-smallBlenderbotSmallConfig (BlenderbotSmall model)
  • blipBlipConfig (BLIP model)
  • blip-2Blip2Config (BLIP-2 model)
  • blip_2_qformerBlip2QFormerConfig (BLIP-2 QFormer model)
  • bloomBloomConfig (BLOOM model)
  • bridgetowerBridgeTowerConfig (BridgeTower model)
  • brosBrosConfig (BROS model)
  • camembertCamembertConfig (CamemBERT model)
  • canineCanineConfig (CANINE model)
  • chameleonChameleonConfig (Chameleon model)
  • chinese_clipChineseCLIPConfig (Chinese-CLIP model)
  • chinese_clip_vision_modelChineseCLIPVisionConfig (ChineseCLIPVisionModel model)
  • clapClapConfig (CLAP model)
  • clipCLIPConfig (CLIP model)
  • clip_text_modelCLIPTextConfig (CLIPTextModel model)
  • clip_vision_modelCLIPVisionConfig (CLIPVisionModel model)
  • clipsegCLIPSegConfig (CLIPSeg model)
  • clvpClvpConfig (CLVP model)
  • code_llamaLlamaConfig (CodeLlama model)
  • codegenCodeGenConfig (CodeGen model)
  • cohereCohereConfig (Cohere model)
  • cohere2Cohere2Config (Cohere2 model)
  • cohere2_visionCohere2VisionConfig (Cohere2Vision model)
  • colpaliColPaliConfig (ColPali model)
  • colqwen2ColQwen2Config (ColQwen2 model)
  • conditional_detrConditionalDetrConfig (Conditional DETR model)
  • convbertConvBertConfig (ConvBERT model)
  • convnextConvNextConfig (ConvNeXT model)
  • convnextv2ConvNextV2Config (ConvNeXTV2 model)
  • cpmantCpmAntConfig (CPM-Ant model)
  • csmCsmConfig (CSM model)
  • ctrlCTRLConfig (CTRL model)
  • cvtCvtConfig (CvT model)
  • d_fineDFineConfig (D-FINE model)
  • dab-detrDabDetrConfig (DAB-DETR model)
  • dacDacConfig (DAC model)
  • data2vec-audioData2VecAudioConfig (Data2VecAudio model)
  • data2vec-textData2VecTextConfig (Data2VecText model)
  • data2vec-visionData2VecVisionConfig (Data2VecVision model)
  • dbrxDbrxConfig (DBRX model)
  • debertaDebertaConfig (DeBERTa model)
  • deberta-v2DebertaV2Config (DeBERTa-v2 model)
  • decision_transformerDecisionTransformerConfig (Decision Transformer model)
  • deepseek_v2DeepseekV2Config (DeepSeek-V2 model)
  • deepseek_v3DeepseekV3Config (DeepSeek-V3 model)
  • deepseek_vlDeepseekVLConfig (DeepseekVL model)
  • deepseek_vl_hybridDeepseekVLHybridConfig (DeepseekVLHybrid model)
  • deformable_detrDeformableDetrConfig (Deformable DETR model)
  • deitDeiTConfig (DeiT model)
  • depth_anythingDepthAnythingConfig (Depth Anything model)
  • depth_proDepthProConfig (DepthPro model)
  • detaDetaConfig (DETA model)
  • detrDetrConfig (DETR model)
  • diaDiaConfig (Dia model)
  • diffllamaDiffLlamaConfig (DiffLlama model)
  • dinatDinatConfig (DiNAT model)
  • dinov2Dinov2Config (DINOv2 model)
  • dinov2_with_registersDinov2WithRegistersConfig (DINOv2 with Registers model)
  • dinov3_convnextDINOv3ConvNextConfig (DINOv3 ConvNext model)
  • dinov3_vitDINOv3ViTConfig (DINOv3 ViT model)
  • distilbertDistilBertConfig (DistilBERT model)
  • dogeDogeConfig (Doge model)
  • donut-swinDonutSwinConfig (DonutSwin model)
  • dots1Dots1Config (dots1 model)
  • dprDPRConfig (DPR model)
  • dptDPTConfig (DPT model)
  • efficientformerEfficientFormerConfig (EfficientFormer model)
  • efficientloftrEfficientLoFTRConfig (EfficientLoFTR model)
  • efficientnetEfficientNetConfig (EfficientNet model)
  • electraElectraConfig (ELECTRA model)
  • emu3Emu3Config (Emu3 model)
  • encodecEncodecConfig (EnCodec model)
  • encoder-decoderEncoderDecoderConfig (Encoder decoder model)
  • eomtEomtConfig (EoMT model)
  • ernieErnieConfig (ERNIE model)
  • ernie4_5Ernie4_5Config (Ernie4_5 model)
  • ernie4_5_moeErnie4_5_MoeConfig (Ernie4_5_MoE model)
  • ernie_mErnieMConfig (ErnieM model)
  • esmEsmConfig (ESM model)
  • evollaEvollaConfig (Evolla model)
  • exaone4Exaone4Config (EXAONE-4.0 model)
  • falconFalconConfig (Falcon model)
  • falcon_h1FalconH1Config (FalconH1 model)
  • falcon_mambaFalconMambaConfig (FalconMamba model)
  • fastspeech2_conformerFastSpeech2ConformerConfig (FastSpeech2Conformer model)
  • fastspeech2_conformer_with_hifiganFastSpeech2ConformerWithHifiGanConfig (FastSpeech2ConformerWithHifiGan model)
  • flaubertFlaubertConfig (FlauBERT model)
  • flavaFlavaConfig (FLAVA model)
  • florence2Florence2Config (Florence2 model)
  • fnetFNetConfig (FNet model)
  • focalnetFocalNetConfig (FocalNet model)
  • fsmtFSMTConfig (FairSeq Machine-Translation model)
  • funnelFunnelConfig (Funnel Transformer model)
  • fuyuFuyuConfig (Fuyu model)
  • gemmaGemmaConfig (Gemma model)
  • gemma2Gemma2Config (Gemma2 model)
  • gemma3Gemma3Config (Gemma3ForConditionalGeneration model)
  • gemma3_textGemma3TextConfig (Gemma3ForCausalLM model)
  • gemma3nGemma3nConfig (Gemma3nForConditionalGeneration model)
  • gemma3n_audioGemma3nAudioConfig (Gemma3nAudioEncoder model)
  • gemma3n_textGemma3nTextConfig (Gemma3nForCausalLM model)
  • gemma3n_visionGemma3nVisionConfig (TimmWrapperModel model)
  • gitGitConfig (GIT model)
  • glmGlmConfig (GLM model)
  • glm4Glm4Config (GLM4 model)
  • glm4_moeGlm4MoeConfig (Glm4MoE model)
  • glm4vGlm4vConfig (GLM4V model)
  • glm4v_moeGlm4vMoeConfig (GLM4VMOE model)
  • glm4v_moe_textGlm4vMoeTextConfig (GLM4VMOE model)
  • glm4v_textGlm4vTextConfig (GLM4V model)
  • glpnGLPNConfig (GLPN model)
  • got_ocr2GotOcr2Config (GOT-OCR2 model)
  • gpt-sw3GPT2Config (GPT-Sw3 model)
  • gpt2GPT2Config (OpenAI GPT-2 model)
  • gpt_bigcodeGPTBigCodeConfig (GPTBigCode model)
  • gpt_neoGPTNeoConfig (GPT Neo model)
  • gpt_neoxGPTNeoXConfig (GPT NeoX model)
  • gpt_neox_japaneseGPTNeoXJapaneseConfig (GPT NeoX Japanese model)
  • gpt_ossGptOssConfig (GptOss model)
  • gptjGPTJConfig (GPT-J model)
  • gptsan-japaneseGPTSanJapaneseConfig (GPTSAN-japanese model)
  • graniteGraniteConfig (Granite model)
  • granite_speechGraniteSpeechConfig (GraniteSpeech model)
  • granitemoeGraniteMoeConfig (GraniteMoeMoe model)
  • granitemoehybridGraniteMoeHybridConfig (GraniteMoeHybrid model)
  • granitemoesharedGraniteMoeSharedConfig (GraniteMoeSharedMoe model)
  • granitevisionLlavaNextConfig (LLaVA-NeXT model)
  • graphormerGraphormerConfig (Graphormer model)
  • grounding-dinoGroundingDinoConfig (Grounding DINO model)
  • groupvitGroupViTConfig (GroupViT model)
  • heliumHeliumConfig (Helium model)
  • hgnet_v2HGNetV2Config (HGNet-V2 model)
  • hieraHieraConfig (Hiera model)
  • hubertHubertConfig (Hubert model)
  • hunyuan_v1_denseHunYuanDenseV1Config (HunYuanDenseV1 model)
  • hunyuan_v1_moeHunYuanMoEV1Config (HunYuanMoeV1 model)
  • ibertIBertConfig (I-BERT model)
  • ideficsIdeficsConfig (IDEFICS model)
  • idefics2Idefics2Config (Idefics2 model)
  • idefics3Idefics3Config (Idefics3 model)
  • idefics3_visionIdefics3VisionConfig (Idefics3VisionTransformer model)
  • ijepaIJepaConfig (I-JEPA model)
  • imagegptImageGPTConfig (ImageGPT model)
  • informerInformerConfig (Informer model)
  • instructblipInstructBlipConfig (InstructBLIP model)
  • instructblipvideoInstructBlipVideoConfig (InstructBlipVideo model)
  • internvlInternVLConfig (InternVL model)
  • internvl_visionInternVLVisionConfig (InternVLVision model)
  • jambaJambaConfig (Jamba model)
  • janusJanusConfig (Janus model)
  • jetmoeJetMoeConfig (JetMoe model)
  • jukeboxJukeboxConfig (Jukebox model)
  • kosmos-2Kosmos2Config (KOSMOS-2 model)
  • kosmos-2.5Kosmos2_5Config (KOSMOS-2.5 model)
  • kyutai_speech_to_textKyutaiSpeechToTextConfig (KyutaiSpeechToText model)
  • layoutlmLayoutLMConfig (LayoutLM model)
  • layoutlmv2LayoutLMv2Config (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3Config (LayoutLMv3 model)
  • ledLEDConfig (LED model)
  • levitLevitConfig (LeViT model)
  • lfm2Lfm2Config (Lfm2 model)
  • lightglueLightGlueConfig (LightGlue model)
  • liltLiltConfig (LiLT model)
  • llamaLlamaConfig (LLaMA model)
  • llama4Llama4Config (Llama4 model)
  • llama4_textLlama4TextConfig (Llama4ForCausalLM model)
  • llavaLlavaConfig (LLaVa model)
  • llava_nextLlavaNextConfig (LLaVA-NeXT model)
  • llava_next_videoLlavaNextVideoConfig (LLaVa-NeXT-Video model)
  • llava_onevisionLlavaOnevisionConfig (LLaVA-Onevision model)
  • longformerLongformerConfig (Longformer model)
  • longt5LongT5Config (LongT5 model)
  • lukeLukeConfig (LUKE model)
  • lxmertLxmertConfig (LXMERT model)
  • m2m_100M2M100Config (M2M100 model)
  • mambaMambaConfig (Mamba model)
  • mamba2Mamba2Config (mamba2 model)
  • marianMarianConfig (Marian model)
  • markuplmMarkupLMConfig (MarkupLM model)
  • mask2formerMask2FormerConfig (Mask2Former model)
  • maskformerMaskFormerConfig (MaskFormer model)
  • maskformer-swinMaskFormerSwinConfig (MaskFormerSwin model)
  • mbartMBartConfig (mBART model)
  • mctctMCTCTConfig (M-CTC-T model)
  • megaMegaConfig (MEGA model)
  • megatron-bertMegatronBertConfig (Megatron-BERT model)
  • metaclip_2MetaClip2Config (MetaCLIP 2 model)
  • mgp-strMgpstrConfig (MGP-STR model)
  • mimiMimiConfig (Mimi model)
  • minimaxMiniMaxConfig (MiniMax model)
  • mistralMistralConfig (Mistral model)
  • mistral3Mistral3Config (Mistral3 model)
  • mixtralMixtralConfig (Mixtral model)
  • mlcdMLCDVisionConfig (MLCD model)
  • mllamaMllamaConfig (Mllama model)
  • mm-grounding-dinoMMGroundingDinoConfig (MM Grounding DINO model)
  • mobilebertMobileBertConfig (MobileBERT model)
  • mobilenet_v1MobileNetV1Config (MobileNetV1 model)
  • mobilenet_v2MobileNetV2Config (MobileNetV2 model)
  • mobilevitMobileViTConfig (MobileViT model)
  • mobilevitv2MobileViTV2Config (MobileViTV2 model)
  • modernbertModernBertConfig (ModernBERT model)
  • modernbert-decoderModernBertDecoderConfig (ModernBertDecoder model)
  • moonshineMoonshineConfig (Moonshine model)
  • moshiMoshiConfig (Moshi model)
  • mpnetMPNetConfig (MPNet model)
  • mptMptConfig (MPT model)
  • mraMraConfig (MRA model)
  • mt5MT5Config (MT5 model)
  • musicgenMusicgenConfig (MusicGen model)
  • musicgen_melodyMusicgenMelodyConfig (MusicGen Melody model)
  • mvpMvpConfig (MVP model)
  • natNatConfig (NAT model)
  • nemotronNemotronConfig (Nemotron model)
  • nezhaNezhaConfig (Nezha model)
  • nllb-moeNllbMoeConfig (NLLB-MOE model)
  • nougatVisionEncoderDecoderConfig (Nougat model)
  • nystromformerNystromformerConfig (Nyströmformer model)
  • olmoOlmoConfig (OLMo model)
  • olmo2Olmo2Config (OLMo2 model)
  • olmoeOlmoeConfig (OLMoE model)
  • omdet-turboOmDetTurboConfig (OmDet-Turbo model)
  • oneformerOneFormerConfig (OneFormer model)
  • open-llamaOpenLlamaConfig (OpenLlama model)
  • openai-gptOpenAIGPTConfig (OpenAI GPT model)
  • optOPTConfig (OPT model)
  • ovis2Ovis2Config (Ovis2 model)
  • owlv2Owlv2Config (OWLv2 model)
  • owlvitOwlViTConfig (OWL-ViT model)
  • paligemmaPaliGemmaConfig (PaliGemma model)
  • patchtsmixerPatchTSMixerConfig (PatchTSMixer model)
  • patchtstPatchTSTConfig (PatchTST model)
  • pegasusPegasusConfig (Pegasus model)
  • pegasus_xPegasusXConfig (PEGASUS-X model)
  • perceiverPerceiverConfig (Perceiver model)
  • perception_encoderTimmWrapperConfig (PerceptionEncoder model)
  • perception_lmPerceptionLMConfig (PerceptionLM model)
  • persimmonPersimmonConfig (Persimmon model)
  • phiPhiConfig (Phi model)
  • phi3Phi3Config (Phi3 model)
  • phi4_multimodalPhi4MultimodalConfig (Phi4Multimodal model)
  • phimoePhimoeConfig (Phimoe model)
  • pix2structPix2StructConfig (Pix2Struct model)
  • pixtralPixtralVisionConfig (Pixtral model)
  • plbartPLBartConfig (PLBart model)
  • poolformerPoolFormerConfig (PoolFormer model)
  • pop2pianoPop2PianoConfig (Pop2Piano model)
  • prompt_depth_anythingPromptDepthAnythingConfig (PromptDepthAnything model)
  • prophetnetProphetNetConfig (ProphetNet model)
  • pvtPvtConfig (PVT model)
  • pvt_v2PvtV2Config (PVTv2 model)
  • qdqbertQDQBertConfig (QDQBert model)
  • qwen2Qwen2Config (Qwen2 model)
  • qwen2_5_omniQwen2_5OmniConfig (Qwen2_5Omni model)
  • qwen2_5_vlQwen2_5_VLConfig (Qwen2_5_VL model)
  • qwen2_5_vl_textQwen2_5_VLTextConfig (Qwen2_5_VL model)
  • qwen2_audioQwen2AudioConfig (Qwen2Audio model)
  • qwen2_audio_encoderQwen2AudioEncoderConfig (Qwen2AudioEncoder model)
  • qwen2_moeQwen2MoeConfig (Qwen2MoE model)
  • qwen2_vlQwen2VLConfig (Qwen2VL model)
  • qwen2_vl_textQwen2VLTextConfig (Qwen2VL model)
  • qwen3Qwen3Config (Qwen3 model)
  • qwen3_moeQwen3MoeConfig (Qwen3MoE model)
  • ragRagConfig (RAG model)
  • realmRealmConfig (REALM model)
  • recurrent_gemmaRecurrentGemmaConfig (RecurrentGemma model)
  • reformerReformerConfig (Reformer model)
  • regnetRegNetConfig (RegNet model)
  • rembertRemBertConfig (RemBERT model)
  • resnetResNetConfig (ResNet model)
  • retribertRetriBertConfig (RetriBERT model)
  • robertaRobertaConfig (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormConfig (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertConfig (RoCBert model)
  • roformerRoFormerConfig (RoFormer model)
  • rt_detrRTDetrConfig (RT-DETR model)
  • rt_detr_resnetRTDetrResNetConfig (RT-DETR-ResNet model)
  • rt_detr_v2RTDetrV2Config (RT-DETRv2 model)
  • rwkvRwkvConfig (RWKV model)
  • samSamConfig (SAM model)
  • sam2Sam2Config (SAM2 model)
  • sam2_hiera_det_modelSam2HieraDetConfig (Sam2HieraDetModel model)
  • sam2_videoSam2VideoConfig (Sam2VideoModel model)
  • sam2_vision_modelSam2VisionConfig (Sam2VisionModel model)
  • sam_hqSamHQConfig (SAM-HQ model)
  • sam_hq_vision_modelSamHQVisionConfig (SamHQVisionModel model)
  • sam_vision_modelSamVisionConfig (SamVisionModel model)
  • seamless_m4tSeamlessM4TConfig (SeamlessM4T model)
  • seamless_m4t_v2SeamlessM4Tv2Config (SeamlessM4Tv2 model)
  • seed_ossSeedOssConfig (SeedOss model)
  • segformerSegformerConfig (SegFormer model)
  • seggptSegGptConfig (SegGPT model)
  • sewSEWConfig (SEW model)
  • sew-dSEWDConfig (SEW-D model)
  • shieldgemma2ShieldGemma2Config (Shieldgemma2 model)
  • siglipSiglipConfig (SigLIP model)
  • siglip2Siglip2Config (SigLIP2 model)
  • siglip_vision_modelSiglipVisionConfig (SiglipVisionModel model)
  • smollm3SmolLM3Config (SmolLM3 model)
  • smolvlmSmolVLMConfig (SmolVLM model)
  • smolvlm_visionSmolVLMVisionConfig (SmolVLMVisionTransformer model)
  • speech-encoder-decoderSpeechEncoderDecoderConfig (Speech Encoder decoder model)
  • speech_to_textSpeech2TextConfig (Speech2Text model)
  • speech_to_text_2Speech2Text2Config (Speech2Text2 model)
  • speecht5SpeechT5Config (SpeechT5 model)
  • splinterSplinterConfig (Splinter model)
  • squeezebertSqueezeBertConfig (SqueezeBERT model)
  • stablelmStableLmConfig (StableLm model)
  • starcoder2Starcoder2Config (Starcoder2 model)
  • superglueSuperGlueConfig (SuperGlue model)
  • superpointSuperPointConfig (SuperPoint model)
  • swiftformerSwiftFormerConfig (SwiftFormer model)
  • swinSwinConfig (Swin Transformer model)
  • swin2srSwin2SRConfig (Swin2SR model)
  • swinv2Swinv2Config (Swin Transformer V2 model)
  • switch_transformersSwitchTransformersConfig (SwitchTransformers model)
  • t5T5Config (T5 model)
  • t5gemmaT5GemmaConfig (T5Gemma model)
  • table-transformerTableTransformerConfig (Table Transformer model)
  • tapasTapasConfig (TAPAS model)
  • textnetTextNetConfig (TextNet model)
  • time_series_transformerTimeSeriesTransformerConfig (Time Series Transformer model)
  • timesfmTimesFmConfig (TimesFm model)
  • timesformerTimesformerConfig (TimeSformer model)
  • timm_backboneTimmBackboneConfig (TimmBackbone model)
  • timm_wrapperTimmWrapperConfig (TimmWrapperModel model)
  • trajectory_transformerTrajectoryTransformerConfig (Trajectory Transformer model)
  • transfo-xlTransfoXLConfig (Transformer-XL model)
  • trocrTrOCRConfig (TrOCR model)
  • tvltTvltConfig (TVLT model)
  • tvpTvpConfig (TVP model)
  • udopUdopConfig (UDOP model)
  • umt5UMT5Config (UMT5 model)
  • unispeechUniSpeechConfig (UniSpeech model)
  • unispeech-satUniSpeechSatConfig (UniSpeechSat model)
  • univnetUnivNetConfig (UnivNet model)
  • upernetUperNetConfig (UPerNet model)
  • vanVanConfig (VAN model)
  • video_llavaVideoLlavaConfig (VideoLlava model)
  • videomaeVideoMAEConfig (VideoMAE model)
  • viltViltConfig (ViLT model)
  • vipllavaVipLlavaConfig (VipLlava model)
  • vision-encoder-decoderVisionEncoderDecoderConfig (Vision Encoder decoder model)
  • vision-text-dual-encoderVisionTextDualEncoderConfig (VisionTextDualEncoder model)
  • visual_bertVisualBertConfig (VisualBERT model)
  • vitViTConfig (ViT model)
  • vit_hybridViTHybridConfig (ViT Hybrid model)
  • vit_maeViTMAEConfig (ViTMAE model)
  • vit_msnViTMSNConfig (ViTMSN model)
  • vitdetVitDetConfig (VitDet model)
  • vitmatteVitMatteConfig (ViTMatte model)
  • vitposeVitPoseConfig (ViTPose model)
  • vitpose_backboneVitPoseBackboneConfig (ViTPoseBackbone model)
  • vitsVitsConfig (VITS model)
  • vivitVivitConfig (ViViT model)
  • vjepa2VJEPA2Config (VJEPA2Model model)
  • voxtralVoxtralConfig (Voxtral model)
  • voxtral_encoderVoxtralEncoderConfig (Voxtral Encoder model)
  • wav2vec2Wav2Vec2Config (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2BertConfig (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2ConformerConfig (Wav2Vec2-Conformer model)
  • wavlmWavLMConfig (WavLM model)
  • whisperWhisperConfig (Whisper model)
  • xclipXCLIPConfig (X-CLIP model)
  • xcodecXcodecConfig (X-CODEC model)
  • xglmXGLMConfig (XGLM model)
  • xlmXLMConfig (XLM model)
  • xlm-prophetnetXLMProphetNetConfig (XLM-ProphetNet model)
  • xlm-robertaXLMRobertaConfig (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLConfig (XLM-RoBERTa-XL model)
  • xlnetXLNetConfig (XLNet model)
  • xlstmxLSTMConfig (xLSTM model)
  • xmodXmodConfig (X-MOD model)
  • yolosYolosConfig (YOLOS model)
  • yosoYosoConfig (YOSO model)
  • zambaZambaConfig (Zamba model)
  • zamba2Zamba2Config (Zamba2 model)
  • zoedepthZoeDepthConfig (ZoeDepth model)

Examples:

>>> from transformers import AutoConfig

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")

>>> # Download configuration from huggingface.co (user-uploaded) and cache.
>>> config = AutoConfig.from_pretrained("dbmdz/bert-base-german-cased")

>>> # If configuration file is in a directory (e.g., was saved using *save_pretrained('./test/saved_model/')*).
>>> config = AutoConfig.from_pretrained("./test/bert_saved_model/")

>>> # Load a specific configuration file.
>>> config = AutoConfig.from_pretrained("./test/bert_saved_model/my_configuration.json")

>>> # Change some config attributes when loading a pretrained config.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased", output_attentions=True, foo=False)
>>> config.output_attentions
True

>>> config, unused_kwargs = AutoConfig.from_pretrained(
...     "google-bert/bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
... )
>>> config.output_attentions
True

>>> unused_kwargs
{'foo': False}

register

< >

( model_type config exist_ok = False )

Parameters

  • model_type (str) — The model type like “bert” or “gpt”.
  • config (PretrainedConfig) — The config to register.

Register a new configuration for this class.

AutoTokenizer

class transformers.AutoTokenizer

< >

( )

This is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when created with the AutoTokenizer.from_pretrained() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_pretrained

< >

( pretrained_model_name_or_path *inputs **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co.
    • A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the save_pretrained() method, e.g., ./my_model_directory/.
    • A path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (like Bert or XLNet), e.g.: ./my_model_directory/vocab.txt. (Not applicable to all derived classes)
  • inputs (additional positional arguments, optional) — Will be passed along to the Tokenizer __init__() method.
  • config (PretrainedConfig, optional) — The configuration object used to determine the tokenizer class to instantiate.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download the model weights and configuration files and override the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • subfolder (str, optional) — In case the relevant files are located inside a subfolder of the model repo on huggingface.co (e.g. for facebook/rag-token-base), specify it here.
  • use_fast (bool, optional, defaults to True) — Use a fast Rust-based tokenizer if it is supported for a given model. If a fast tokenizer is not available for a given model, a normal Python-based tokenizer is returned instead.
  • tokenizer_type (str, optional) — Tokenizer type to be loaded.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • kwargs (additional keyword arguments, optional) — Will be passed to the Tokenizer __init__() method. Can be used to set special tokens like bos_token, eos_token, unk_token, sep_token, pad_token, cls_token, mask_token, additional_special_tokens. See parameters in the __init__() for more details.

Instantiate one of the tokenizer classes of the library from a pretrained model vocabulary.

The tokenizer class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • aimv2CLIPTokenizer or CLIPTokenizerFast (AIMv2 model)
  • albertAlbertTokenizer or AlbertTokenizerFast (ALBERT model)
  • alignBertTokenizer or BertTokenizerFast (ALIGN model)
  • arceeLlamaTokenizer or LlamaTokenizerFast (Arcee model)
  • ariaLlamaTokenizer or LlamaTokenizerFast (Aria model)
  • aya_visionCohereTokenizerFast (AyaVision model)
  • barkBertTokenizer or BertTokenizerFast (Bark model)
  • bartBartTokenizer or BartTokenizerFast (BART model)
  • barthezBarthezTokenizer or BarthezTokenizerFast (BARThez model)
  • bartphoBartphoTokenizer (BARTpho model)
  • bertBertTokenizer or BertTokenizerFast (BERT model)
  • bert-generationBertGenerationTokenizer (Bert Generation model)
  • bert-japaneseBertJapaneseTokenizer (BertJapanese model)
  • bertweetBertweetTokenizer (BERTweet model)
  • big_birdBigBirdTokenizer or BigBirdTokenizerFast (BigBird model)
  • bigbird_pegasusPegasusTokenizer or PegasusTokenizerFast (BigBird-Pegasus model)
  • biogptBioGptTokenizer (BioGpt model)
  • bitnetPreTrainedTokenizerFast (BitNet model)
  • blenderbotBlenderbotTokenizer or BlenderbotTokenizerFast (Blenderbot model)
  • blenderbot-smallBlenderbotSmallTokenizer (BlenderbotSmall model)
  • blipBertTokenizer or BertTokenizerFast (BLIP model)
  • blip-2GPT2Tokenizer or GPT2TokenizerFast (BLIP-2 model)
  • bloomBloomTokenizerFast (BLOOM model)
  • bridgetowerRobertaTokenizer or RobertaTokenizerFast (BridgeTower model)
  • brosBertTokenizer or BertTokenizerFast (BROS model)
  • byt5ByT5Tokenizer (ByT5 model)
  • camembertCamembertTokenizer or CamembertTokenizerFast (CamemBERT model)
  • canineCanineTokenizer (CANINE model)
  • chameleonLlamaTokenizer or LlamaTokenizerFast (Chameleon model)
  • chinese_clipBertTokenizer or BertTokenizerFast (Chinese-CLIP model)
  • clapRobertaTokenizer or RobertaTokenizerFast (CLAP model)
  • clipCLIPTokenizer or CLIPTokenizerFast (CLIP model)
  • clipsegCLIPTokenizer or CLIPTokenizerFast (CLIPSeg model)
  • clvpClvpTokenizer (CLVP model)
  • code_llamaCodeLlamaTokenizer or CodeLlamaTokenizerFast (CodeLlama model)
  • codegenCodeGenTokenizer or CodeGenTokenizerFast (CodeGen model)
  • cohereCohereTokenizerFast (Cohere model)
  • cohere2CohereTokenizerFast (Cohere2 model)
  • colpaliLlamaTokenizer or LlamaTokenizerFast (ColPali model)
  • colqwen2Qwen2Tokenizer or Qwen2TokenizerFast (ColQwen2 model)
  • convbertConvBertTokenizer or ConvBertTokenizerFast (ConvBERT model)
  • cpmCpmTokenizer or CpmTokenizerFast (CPM model)
  • cpmantCpmAntTokenizer (CPM-Ant model)
  • ctrlCTRLTokenizer (CTRL model)
  • data2vec-audioWav2Vec2CTCTokenizer (Data2VecAudio model)
  • data2vec-textRobertaTokenizer or RobertaTokenizerFast (Data2VecText model)
  • dbrxGPT2Tokenizer or GPT2TokenizerFast (DBRX model)
  • debertaDebertaTokenizer or DebertaTokenizerFast (DeBERTa model)
  • deberta-v2DebertaV2Tokenizer or DebertaV2TokenizerFast (DeBERTa-v2 model)
  • deepseek_v2LlamaTokenizer or LlamaTokenizerFast (DeepSeek-V2 model)
  • deepseek_v3LlamaTokenizer or LlamaTokenizerFast (DeepSeek-V3 model)
  • deepseek_vlLlamaTokenizer or LlamaTokenizerFast (DeepseekVL model)
  • deepseek_vl_hybridLlamaTokenizer or LlamaTokenizerFast (DeepseekVLHybrid model)
  • diaDiaTokenizer (Dia model)
  • diffllamaLlamaTokenizer or LlamaTokenizerFast (DiffLlama model)
  • distilbertDistilBertTokenizer or DistilBertTokenizerFast (DistilBERT model)
  • dprDPRQuestionEncoderTokenizer or DPRQuestionEncoderTokenizerFast (DPR model)
  • electraElectraTokenizer or ElectraTokenizerFast (ELECTRA model)
  • emu3GPT2Tokenizer or GPT2TokenizerFast (Emu3 model)
  • ernieBertTokenizer or BertTokenizerFast (ERNIE model)
  • ernie4_5LlamaTokenizerFast (Ernie4_5 model)
  • ernie4_5_moeLlamaTokenizerFast (Ernie4_5_MoE model)
  • ernie_mErnieMTokenizer (ErnieM model)
  • esmEsmTokenizer (ESM model)
  • exaone4GPT2Tokenizer or GPT2TokenizerFast (EXAONE-4.0 model)
  • falconPreTrainedTokenizerFast (Falcon model)
  • falcon_mambaGPTNeoXTokenizerFast (FalconMamba model)
  • fastspeech2_conformer — (FastSpeech2Conformer model)
  • flaubertFlaubertTokenizer (FlauBERT model)
  • fnetFNetTokenizer or FNetTokenizerFast (FNet model)
  • fsmtFSMTTokenizer (FairSeq Machine-Translation model)
  • funnelFunnelTokenizer or FunnelTokenizerFast (Funnel Transformer model)
  • gemmaGemmaTokenizer or GemmaTokenizerFast (Gemma model)
  • gemma2GemmaTokenizer or GemmaTokenizerFast (Gemma2 model)
  • gemma3GemmaTokenizer or GemmaTokenizerFast (Gemma3ForConditionalGeneration model)
  • gemma3_textGemmaTokenizer or GemmaTokenizerFast (Gemma3ForCausalLM model)
  • gemma3nGemmaTokenizer or GemmaTokenizerFast (Gemma3nForConditionalGeneration model)
  • gemma3n_textGemmaTokenizer or GemmaTokenizerFast (Gemma3nForCausalLM model)
  • gitBertTokenizer or BertTokenizerFast (GIT model)
  • glmPreTrainedTokenizerFast (GLM model)
  • glm4PreTrainedTokenizerFast (GLM4 model)
  • glm4_moePreTrainedTokenizerFast (Glm4MoE model)
  • glm4vPreTrainedTokenizerFast (GLM4V model)
  • glm4v_moePreTrainedTokenizerFast (GLM4VMOE model)
  • gpt-sw3GPTSw3Tokenizer (GPT-Sw3 model)
  • gpt2GPT2Tokenizer or GPT2TokenizerFast (OpenAI GPT-2 model)
  • gpt_bigcodeGPT2Tokenizer or GPT2TokenizerFast (GPTBigCode model)
  • gpt_neoGPT2Tokenizer or GPT2TokenizerFast (GPT Neo model)
  • gpt_neoxGPTNeoXTokenizerFast (GPT NeoX model)
  • gpt_neox_japaneseGPTNeoXJapaneseTokenizer (GPT NeoX Japanese model)
  • gpt_ossPreTrainedTokenizerFast (GptOss model)
  • gptjGPT2Tokenizer or GPT2TokenizerFast (GPT-J model)
  • gptsan-japaneseGPTSanJapaneseTokenizer (GPTSAN-japanese model)
  • graniteGPT2Tokenizer (Granite model)
  • granitemoeGPT2Tokenizer (GraniteMoeMoe model)
  • granitemoehybridGPT2Tokenizer (GraniteMoeHybrid model)
  • granitemoesharedGPT2Tokenizer (GraniteMoeSharedMoe model)
  • grounding-dinoBertTokenizer or BertTokenizerFast (Grounding DINO model)
  • groupvitCLIPTokenizer or CLIPTokenizerFast (GroupViT model)
  • heliumPreTrainedTokenizerFast (Helium model)
  • herbertHerbertTokenizer or HerbertTokenizerFast (HerBERT model)
  • hubertWav2Vec2CTCTokenizer (Hubert model)
  • ibertRobertaTokenizer or RobertaTokenizerFast (I-BERT model)
  • ideficsLlamaTokenizerFast (IDEFICS model)
  • idefics2LlamaTokenizer or LlamaTokenizerFast (Idefics2 model)
  • idefics3LlamaTokenizer or LlamaTokenizerFast (Idefics3 model)
  • instructblipGPT2Tokenizer or GPT2TokenizerFast (InstructBLIP model)
  • instructblipvideoGPT2Tokenizer or GPT2TokenizerFast (InstructBlipVideo model)
  • internvlQwen2Tokenizer or Qwen2TokenizerFast (InternVL model)
  • jambaLlamaTokenizer or LlamaTokenizerFast (Jamba model)
  • janusLlamaTokenizerFast (Janus model)
  • jetmoeLlamaTokenizer or LlamaTokenizerFast (JetMoe model)
  • jukeboxJukeboxTokenizer (Jukebox model)
  • kosmos-2XLMRobertaTokenizer or XLMRobertaTokenizerFast (KOSMOS-2 model)
  • kosmos-2.5PreTrainedTokenizerFast (KOSMOS-2.5 model)
  • layoutlmLayoutLMTokenizer or LayoutLMTokenizerFast (LayoutLM model)
  • layoutlmv2LayoutLMv2Tokenizer or LayoutLMv2TokenizerFast (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3Tokenizer or LayoutLMv3TokenizerFast (LayoutLMv3 model)
  • layoutxlmLayoutXLMTokenizer or LayoutXLMTokenizerFast (LayoutXLM model)
  • ledLEDTokenizer or LEDTokenizerFast (LED model)
  • liltLayoutLMv3Tokenizer or LayoutLMv3TokenizerFast (LiLT model)
  • llamaLlamaTokenizer or LlamaTokenizerFast (LLaMA model)
  • llama4LlamaTokenizer or LlamaTokenizerFast (Llama4 model)
  • llama4_textLlamaTokenizer or LlamaTokenizerFast (Llama4ForCausalLM model)
  • llavaLlamaTokenizer or LlamaTokenizerFast (LLaVa model)
  • llava_nextLlamaTokenizer or LlamaTokenizerFast (LLaVA-NeXT model)
  • llava_next_videoLlamaTokenizer or LlamaTokenizerFast (LLaVa-NeXT-Video model)
  • llava_onevisionLlamaTokenizer or LlamaTokenizerFast (LLaVA-Onevision model)
  • longformerLongformerTokenizer or LongformerTokenizerFast (Longformer model)
  • longt5T5Tokenizer or T5TokenizerFast (LongT5 model)
  • lukeLukeTokenizer (LUKE model)
  • lxmertLxmertTokenizer or LxmertTokenizerFast (LXMERT model)
  • m2m_100M2M100Tokenizer (M2M100 model)
  • mambaGPTNeoXTokenizerFast (Mamba model)
  • mamba2GPTNeoXTokenizerFast (mamba2 model)
  • marianMarianTokenizer (Marian model)
  • mbartMBartTokenizer or MBartTokenizerFast (mBART model)
  • mbart50MBart50Tokenizer or MBart50TokenizerFast (mBART-50 model)
  • megaRobertaTokenizer or RobertaTokenizerFast (MEGA model)
  • megatron-bertBertTokenizer or BertTokenizerFast (Megatron-BERT model)
  • metaclip_2XLMRobertaTokenizer or XLMRobertaTokenizerFast (MetaCLIP 2 model)
  • mgp-strMgpstrTokenizer (MGP-STR model)
  • minimaxGPT2Tokenizer or GPT2TokenizerFast (MiniMax model)
  • mistralMistralCommonTokenizer (Mistral model)
  • mixtralMistralCommonTokenizer (Mixtral model)
  • mllamaLlamaTokenizer or LlamaTokenizerFast (Mllama model)
  • mlukeMLukeTokenizer (mLUKE model)
  • mm-grounding-dinoBertTokenizer or BertTokenizerFast (MM Grounding DINO model)
  • mobilebertMobileBertTokenizer or MobileBertTokenizerFast (MobileBERT model)
  • modernbertPreTrainedTokenizerFast (ModernBERT model)
  • moonshinePreTrainedTokenizerFast (Moonshine model)
  • moshiPreTrainedTokenizerFast (Moshi model)
  • mpnetMPNetTokenizer or MPNetTokenizerFast (MPNet model)
  • mptGPTNeoXTokenizerFast (MPT model)
  • mraRobertaTokenizer or RobertaTokenizerFast (MRA model)
  • mt5MT5Tokenizer or MT5TokenizerFast (MT5 model)
  • musicgenT5Tokenizer or T5TokenizerFast (MusicGen model)
  • musicgen_melodyT5Tokenizer or T5TokenizerFast (MusicGen Melody model)
  • mvpMvpTokenizer or MvpTokenizerFast (MVP model)
  • myt5MyT5Tokenizer (myt5 model)
  • nemotronPreTrainedTokenizerFast (Nemotron model)
  • nezhaBertTokenizer or BertTokenizerFast (Nezha model)
  • nllbNllbTokenizer or NllbTokenizerFast (NLLB model)
  • nllb-moeNllbTokenizer or NllbTokenizerFast (NLLB-MOE model)
  • nystromformerAlbertTokenizer or AlbertTokenizerFast (Nyströmformer model)
  • olmoGPTNeoXTokenizerFast (OLMo model)
  • olmo2GPTNeoXTokenizerFast (OLMo2 model)
  • olmoeGPTNeoXTokenizerFast (OLMoE model)
  • omdet-turboCLIPTokenizer or CLIPTokenizerFast (OmDet-Turbo model)
  • oneformerCLIPTokenizer or CLIPTokenizerFast (OneFormer model)
  • openai-gptOpenAIGPTTokenizer or OpenAIGPTTokenizerFast (OpenAI GPT model)
  • optGPT2Tokenizer or GPT2TokenizerFast (OPT model)
  • owlv2CLIPTokenizer or CLIPTokenizerFast (OWLv2 model)
  • owlvitCLIPTokenizer or CLIPTokenizerFast (OWL-ViT model)
  • paligemmaLlamaTokenizer or LlamaTokenizerFast (PaliGemma model)
  • pegasusPegasusTokenizer or PegasusTokenizerFast (Pegasus model)
  • pegasus_xPegasusTokenizer or PegasusTokenizerFast (PEGASUS-X model)
  • perceiverPerceiverTokenizer (Perceiver model)
  • persimmonLlamaTokenizer or LlamaTokenizerFast (Persimmon model)
  • phiCodeGenTokenizer or CodeGenTokenizerFast (Phi model)
  • phi3LlamaTokenizer or LlamaTokenizerFast (Phi3 model)
  • phimoeLlamaTokenizer or LlamaTokenizerFast (Phimoe model)
  • phobertPhobertTokenizer (PhoBERT model)
  • pix2structT5Tokenizer or T5TokenizerFast (Pix2Struct model)
  • pixtralMistralCommonTokenizer (Pixtral model)
  • plbartPLBartTokenizer (PLBart model)
  • prophetnetProphetNetTokenizer (ProphetNet model)
  • qdqbertBertTokenizer or BertTokenizerFast (QDQBert model)
  • qwen2Qwen2Tokenizer or Qwen2TokenizerFast (Qwen2 model)
  • qwen2_5_omniQwen2Tokenizer or Qwen2TokenizerFast (Qwen2_5Omni model)
  • qwen2_5_vlQwen2Tokenizer or Qwen2TokenizerFast (Qwen2_5_VL model)
  • qwen2_audioQwen2Tokenizer or Qwen2TokenizerFast (Qwen2Audio model)
  • qwen2_moeQwen2Tokenizer or Qwen2TokenizerFast (Qwen2MoE model)
  • qwen2_vlQwen2Tokenizer or Qwen2TokenizerFast (Qwen2VL model)
  • qwen3Qwen2Tokenizer or Qwen2TokenizerFast (Qwen3 model)
  • qwen3_moeQwen2Tokenizer or Qwen2TokenizerFast (Qwen3MoE model)
  • ragRagTokenizer (RAG model)
  • realmRealmTokenizer or RealmTokenizerFast (REALM model)
  • recurrent_gemmaGemmaTokenizer or GemmaTokenizerFast (RecurrentGemma model)
  • reformerReformerTokenizer or ReformerTokenizerFast (Reformer model)
  • rembertRemBertTokenizer or RemBertTokenizerFast (RemBERT model)
  • retribertRetriBertTokenizer or RetriBertTokenizerFast (RetriBERT model)
  • robertaRobertaTokenizer or RobertaTokenizerFast (RoBERTa model)
  • roberta-prelayernormRobertaTokenizer or RobertaTokenizerFast (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertTokenizer (RoCBert model)
  • roformerRoFormerTokenizer or RoFormerTokenizerFast (RoFormer model)
  • rwkvGPTNeoXTokenizerFast (RWKV model)
  • seamless_m4tSeamlessM4TTokenizer or SeamlessM4TTokenizerFast (SeamlessM4T model)
  • seamless_m4t_v2SeamlessM4TTokenizer or SeamlessM4TTokenizerFast (SeamlessM4Tv2 model)
  • shieldgemma2GemmaTokenizer or GemmaTokenizerFast (Shieldgemma2 model)
  • siglipSiglipTokenizer (SigLIP model)
  • siglip2GemmaTokenizer or GemmaTokenizerFast (SigLIP2 model)
  • smollm3PreTrainedTokenizerFast (SmolLM3 model)
  • speech_to_textSpeech2TextTokenizer (Speech2Text model)
  • speech_to_text_2Speech2Text2Tokenizer (Speech2Text2 model)
  • speecht5SpeechT5Tokenizer (SpeechT5 model)
  • splinterSplinterTokenizer or SplinterTokenizerFast (Splinter model)
  • squeezebertSqueezeBertTokenizer or SqueezeBertTokenizerFast (SqueezeBERT model)
  • stablelmGPTNeoXTokenizerFast (StableLm model)
  • starcoder2GPT2Tokenizer or GPT2TokenizerFast (Starcoder2 model)
  • switch_transformersT5Tokenizer or T5TokenizerFast (SwitchTransformers model)
  • t5T5Tokenizer or T5TokenizerFast (T5 model)
  • t5gemmaGemmaTokenizer or GemmaTokenizerFast (T5Gemma model)
  • tapasTapasTokenizer (TAPAS model)
  • tapexTapexTokenizer (TAPEX model)
  • transfo-xlTransfoXLTokenizer (Transformer-XL model)
  • tvpBertTokenizer or BertTokenizerFast (TVP model)
  • udopUdopTokenizer or UdopTokenizerFast (UDOP model)
  • umt5T5Tokenizer or T5TokenizerFast (UMT5 model)
  • video_llavaLlamaTokenizer or LlamaTokenizerFast (VideoLlava model)
  • viltBertTokenizer or BertTokenizerFast (ViLT model)
  • vipllavaLlamaTokenizer or LlamaTokenizerFast (VipLlava model)
  • visual_bertBertTokenizer or BertTokenizerFast (VisualBERT model)
  • vitsVitsTokenizer (VITS model)
  • voxtralMistralCommonTokenizer (Voxtral model)
  • wav2vec2Wav2Vec2CTCTokenizer (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2CTCTokenizer (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2CTCTokenizer (Wav2Vec2-Conformer model)
  • wav2vec2_phonemeWav2Vec2PhonemeCTCTokenizer (Wav2Vec2Phoneme model)
  • whisperWhisperTokenizer or WhisperTokenizerFast (Whisper model)
  • xclipCLIPTokenizer or CLIPTokenizerFast (X-CLIP model)
  • xglmXGLMTokenizer or XGLMTokenizerFast (XGLM model)
  • xlmXLMTokenizer (XLM model)
  • xlm-prophetnetXLMProphetNetTokenizer (XLM-ProphetNet model)
  • xlm-robertaXLMRobertaTokenizer or XLMRobertaTokenizerFast (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaTokenizer or XLMRobertaTokenizerFast (XLM-RoBERTa-XL model)
  • xlnetXLNetTokenizer or XLNetTokenizerFast (XLNet model)
  • xlstmGPTNeoXTokenizerFast (xLSTM model)
  • xmodXLMRobertaTokenizer or XLMRobertaTokenizerFast (X-MOD model)
  • yosoAlbertTokenizer or AlbertTokenizerFast (YOSO model)
  • zambaLlamaTokenizer or LlamaTokenizerFast (Zamba model)
  • zamba2LlamaTokenizer or LlamaTokenizerFast (Zamba2 model)

Examples:

>>> from transformers import AutoTokenizer

>>> # Download vocabulary from huggingface.co and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")

>>> # Download vocabulary from huggingface.co (user-uploaded) and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-german-cased")

>>> # If vocabulary files are in a directory (e.g. tokenizer was saved using *save_pretrained('./test/saved_model/')*)
>>> # tokenizer = AutoTokenizer.from_pretrained("./test/bert_saved_model/")

>>> # Download vocabulary from huggingface.co and define model-specific arguments
>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base", add_prefix_space=True)

register

< >

( config_class slow_tokenizer_class = None fast_tokenizer_class = None exist_ok = False )

Parameters

  • config_class (PretrainedConfig) — The configuration corresponding to the model to register.
  • slow_tokenizer_class (PretrainedTokenizer, optional) — The slow tokenizer to register.
  • fast_tokenizer_class (PretrainedTokenizerFast, optional) — The fast tokenizer to register.

Register a new tokenizer in this mapping.

AutoFeatureExtractor

class transformers.AutoFeatureExtractor

< >

( )

This is a generic feature extractor class that will be instantiated as one of the feature extractor classes of the library when created with the AutoFeatureExtractor.from_pretrained() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_pretrained

< >

( pretrained_model_name_or_path **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — This can be either:

    • a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
    • a path to a directory containing a feature extractor file saved using the save_pretrained() method, e.g., ./my_model_directory/.
    • a path or url to a saved feature extractor JSON file, e.g., ./my_model_directory/preprocessor_config.json.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.
  • force_download (bool, optional, defaults to False) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running hf auth login (stored in ~/.huggingface).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • return_unused_kwargs (bool, optional, defaults to False) — If False, then this function returns just the final feature extractor object. If True, then this functions returns a Tuple(feature_extractor, unused_kwargs) where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part of kwargs which has not been used to update feature_extractor and is otherwise ignored.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • kwargs (dict[str, Any], optional) — The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by the return_unused_kwargs keyword parameter.

Instantiate one of the feature extractor classes of the library from a pretrained model vocabulary.

The feature extractor class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • audio-spectrogram-transformerASTFeatureExtractor (Audio Spectrogram Transformer model)
  • beitBeitFeatureExtractor (BEiT model)
  • chinese_clipChineseCLIPFeatureExtractor (Chinese-CLIP model)
  • clapClapFeatureExtractor (CLAP model)
  • clipCLIPFeatureExtractor (CLIP model)
  • clipsegViTFeatureExtractor (CLIPSeg model)
  • clvpClvpFeatureExtractor (CLVP model)
  • conditional_detrConditionalDetrFeatureExtractor (Conditional DETR model)
  • convnextConvNextFeatureExtractor (ConvNeXT model)
  • cvtConvNextFeatureExtractor (CvT model)
  • dacDacFeatureExtractor (DAC model)
  • data2vec-audioWav2Vec2FeatureExtractor (Data2VecAudio model)
  • data2vec-visionBeitFeatureExtractor (Data2VecVision model)
  • deformable_detrDeformableDetrFeatureExtractor (Deformable DETR model)
  • deitDeiTFeatureExtractor (DeiT model)
  • detrDetrFeatureExtractor (DETR model)
  • diaDiaFeatureExtractor (Dia model)
  • dinatViTFeatureExtractor (DiNAT model)
  • donut-swinDonutFeatureExtractor (DonutSwin model)
  • dptDPTFeatureExtractor (DPT model)
  • encodecEncodecFeatureExtractor (EnCodec model)
  • flavaFlavaFeatureExtractor (FLAVA model)
  • gemma3nGemma3nAudioFeatureExtractor (Gemma3nForConditionalGeneration model)
  • glpnGLPNFeatureExtractor (GLPN model)
  • granite_speechGraniteSpeechFeatureExtractor (GraniteSpeech model)
  • groupvitCLIPFeatureExtractor (GroupViT model)
  • hubertWav2Vec2FeatureExtractor (Hubert model)
  • imagegptImageGPTFeatureExtractor (ImageGPT model)
  • kyutai_speech_to_textKyutaiSpeechToTextFeatureExtractor (KyutaiSpeechToText model)
  • layoutlmv2LayoutLMv2FeatureExtractor (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3FeatureExtractor (LayoutLMv3 model)
  • levitLevitFeatureExtractor (LeViT model)
  • maskformerMaskFormerFeatureExtractor (MaskFormer model)
  • mctctMCTCTFeatureExtractor (M-CTC-T model)
  • mimiEncodecFeatureExtractor (Mimi model)
  • mobilenet_v1MobileNetV1FeatureExtractor (MobileNetV1 model)
  • mobilenet_v2MobileNetV2FeatureExtractor (MobileNetV2 model)
  • mobilevitMobileViTFeatureExtractor (MobileViT model)
  • moonshineWav2Vec2FeatureExtractor (Moonshine model)
  • moshiEncodecFeatureExtractor (Moshi model)
  • natViTFeatureExtractor (NAT model)
  • owlvitOwlViTFeatureExtractor (OWL-ViT model)
  • perceiverPerceiverFeatureExtractor (Perceiver model)
  • phi4_multimodalPhi4MultimodalFeatureExtractor (Phi4Multimodal model)
  • poolformerPoolFormerFeatureExtractor (PoolFormer model)
  • pop2pianoPop2PianoFeatureExtractor (Pop2Piano model)
  • regnetConvNextFeatureExtractor (RegNet model)
  • resnetConvNextFeatureExtractor (ResNet model)
  • seamless_m4tSeamlessM4TFeatureExtractor (SeamlessM4T model)
  • seamless_m4t_v2SeamlessM4TFeatureExtractor (SeamlessM4Tv2 model)
  • segformerSegformerFeatureExtractor (SegFormer model)
  • sewWav2Vec2FeatureExtractor (SEW model)
  • sew-dWav2Vec2FeatureExtractor (SEW-D model)
  • speech_to_textSpeech2TextFeatureExtractor (Speech2Text model)
  • speecht5SpeechT5FeatureExtractor (SpeechT5 model)
  • swiftformerViTFeatureExtractor (SwiftFormer model)
  • swinViTFeatureExtractor (Swin Transformer model)
  • swinv2ViTFeatureExtractor (Swin Transformer V2 model)
  • table-transformerDetrFeatureExtractor (Table Transformer model)
  • timesformerVideoMAEFeatureExtractor (TimeSformer model)
  • tvltTvltFeatureExtractor (TVLT model)
  • unispeechWav2Vec2FeatureExtractor (UniSpeech model)
  • unispeech-satWav2Vec2FeatureExtractor (UniSpeechSat model)
  • univnetUnivNetFeatureExtractor (UnivNet model)
  • vanConvNextFeatureExtractor (VAN model)
  • videomaeVideoMAEFeatureExtractor (VideoMAE model)
  • viltViltFeatureExtractor (ViLT model)
  • vitViTFeatureExtractor (ViT model)
  • vit_maeViTFeatureExtractor (ViTMAE model)
  • vit_msnViTFeatureExtractor (ViTMSN model)
  • wav2vec2Wav2Vec2FeatureExtractor (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2FeatureExtractor (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2FeatureExtractor (Wav2Vec2-Conformer model)
  • wavlmWav2Vec2FeatureExtractor (WavLM model)
  • whisperWhisperFeatureExtractor (Whisper model)
  • xclipCLIPFeatureExtractor (X-CLIP model)
  • xcodecDacFeatureExtractor (X-CODEC model)
  • yolosYolosFeatureExtractor (YOLOS model)

Passing token=True is required when you want to use a private model.

Examples:

>>> from transformers import AutoFeatureExtractor

>>> # Download feature extractor from huggingface.co and cache.
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")

>>> # If feature extractor files are in a directory (e.g. feature extractor was saved using *save_pretrained('./test/saved_model/')*)
>>> # feature_extractor = AutoFeatureExtractor.from_pretrained("./test/saved_model/")

register

< >

( config_class feature_extractor_class exist_ok = False )

Parameters

  • config_class (PretrainedConfig) — The configuration corresponding to the model to register.
  • feature_extractor_class (FeatureExtractorMixin) — The feature extractor to register.

Register a new feature extractor for this class.

AutoImageProcessor

class transformers.AutoImageProcessor

< >

( )

This is a generic image processor class that will be instantiated as one of the image processor classes of the library when created with the AutoImageProcessor.from_pretrained() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_pretrained

< >

( pretrained_model_name_or_path *inputs **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — This can be either:

    • a string, the model id of a pretrained image_processor hosted inside a model repo on huggingface.co.
    • a path to a directory containing a image processor file saved using the save_pretrained() method, e.g., ./my_model_directory/.
    • a path or url to a saved image processor JSON file, e.g., ./my_model_directory/preprocessor_config.json.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used.
  • force_download (bool, optional, defaults to False) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running hf auth login (stored in ~/.huggingface).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • use_fast (bool, optional, defaults to False) — Use a fast torchvision-base image processor if it is supported for a given model. If a fast image processor is not available for a given model, a normal numpy-based image processor is returned instead.
  • return_unused_kwargs (bool, optional, defaults to False) — If False, then this function returns just the final image processor object. If True, then this functions returns a Tuple(image_processor, unused_kwargs) where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not image processor attributes: i.e., the part of kwargs which has not been used to update image_processor and is otherwise ignored.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • image_processor_filename (str, optional, defaults to "config.json") — The name of the file in the model directory to use for the image processor config.
  • kwargs (dict[str, Any], optional) — The values in kwargs of any keys which are image processor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not image processor attributes is controlled by the return_unused_kwargs keyword parameter.

Instantiate one of the image processor classes of the library from a pretrained model vocabulary.

The image processor class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • aimv2CLIPImageProcessor or CLIPImageProcessorFast (AIMv2 model)
  • aimv2_vision_modelCLIPImageProcessor or CLIPImageProcessorFast (Aimv2VisionModel model)
  • alignEfficientNetImageProcessor or EfficientNetImageProcessorFast (ALIGN model)
  • ariaAriaImageProcessor (Aria model)
  • beitBeitImageProcessor or BeitImageProcessorFast (BEiT model)
  • bitBitImageProcessor or BitImageProcessorFast (BiT model)
  • blipBlipImageProcessor or BlipImageProcessorFast (BLIP model)
  • blip-2BlipImageProcessor or BlipImageProcessorFast (BLIP-2 model)
  • bridgetowerBridgeTowerImageProcessor or BridgeTowerImageProcessorFast (BridgeTower model)
  • chameleonChameleonImageProcessor or ChameleonImageProcessorFast (Chameleon model)
  • chinese_clipChineseCLIPImageProcessor or ChineseCLIPImageProcessorFast (Chinese-CLIP model)
  • clipCLIPImageProcessor or CLIPImageProcessorFast (CLIP model)
  • clipsegViTImageProcessor or ViTImageProcessorFast (CLIPSeg model)
  • cohere2_visionCohere2VisionImageProcessorFast (Cohere2Vision model)
  • conditional_detrConditionalDetrImageProcessor or ConditionalDetrImageProcessorFast (Conditional DETR model)
  • convnextConvNextImageProcessor or ConvNextImageProcessorFast (ConvNeXT model)
  • convnextv2ConvNextImageProcessor or ConvNextImageProcessorFast (ConvNeXTV2 model)
  • cvtConvNextImageProcessor or ConvNextImageProcessorFast (CvT model)
  • data2vec-visionBeitImageProcessor or BeitImageProcessorFast (Data2VecVision model)
  • deepseek_vlDeepseekVLImageProcessor or DeepseekVLImageProcessorFast (DeepseekVL model)
  • deepseek_vl_hybridDeepseekVLHybridImageProcessor or DeepseekVLHybridImageProcessorFast (DeepseekVLHybrid model)
  • deformable_detrDeformableDetrImageProcessor or DeformableDetrImageProcessorFast (Deformable DETR model)
  • deitDeiTImageProcessor or DeiTImageProcessorFast (DeiT model)
  • depth_anythingDPTImageProcessor or DPTImageProcessorFast (Depth Anything model)
  • depth_proDepthProImageProcessor or DepthProImageProcessorFast (DepthPro model)
  • detaDetaImageProcessor (DETA model)
  • detrDetrImageProcessor or DetrImageProcessorFast (DETR model)
  • dinatViTImageProcessor or ViTImageProcessorFast (DiNAT model)
  • dinov2BitImageProcessor or BitImageProcessorFast (DINOv2 model)
  • dinov3_vitDINOv3ViTImageProcessorFast (DINOv3 ViT model)
  • donut-swinDonutImageProcessor or DonutImageProcessorFast (DonutSwin model)
  • dptDPTImageProcessor or DPTImageProcessorFast (DPT model)
  • efficientformerEfficientFormerImageProcessor (EfficientFormer model)
  • efficientloftrEfficientLoFTRImageProcessor (EfficientLoFTR model)
  • efficientnetEfficientNetImageProcessor or EfficientNetImageProcessorFast (EfficientNet model)
  • eomtEomtImageProcessor or EomtImageProcessorFast (EoMT model)
  • flavaFlavaImageProcessor or FlavaImageProcessorFast (FLAVA model)
  • focalnetBitImageProcessor or BitImageProcessorFast (FocalNet model)
  • fuyuFuyuImageProcessor (Fuyu model)
  • gemma3Gemma3ImageProcessor or Gemma3ImageProcessorFast (Gemma3ForConditionalGeneration model)
  • gemma3nSiglipImageProcessor or SiglipImageProcessorFast (Gemma3nForConditionalGeneration model)
  • gitCLIPImageProcessor or CLIPImageProcessorFast (GIT model)
  • glm4vGlm4vImageProcessor or Glm4vImageProcessorFast (GLM4V model)
  • glpnGLPNImageProcessor (GLPN model)
  • got_ocr2GotOcr2ImageProcessor or GotOcr2ImageProcessorFast (GOT-OCR2 model)
  • grounding-dinoGroundingDinoImageProcessor or GroundingDinoImageProcessorFast (Grounding DINO model)
  • groupvitCLIPImageProcessor or CLIPImageProcessorFast (GroupViT model)
  • hieraBitImageProcessor or BitImageProcessorFast (Hiera model)
  • ideficsIdeficsImageProcessor (IDEFICS model)
  • idefics2Idefics2ImageProcessor or Idefics2ImageProcessorFast (Idefics2 model)
  • idefics3Idefics3ImageProcessor or Idefics3ImageProcessorFast (Idefics3 model)
  • ijepaViTImageProcessor or ViTImageProcessorFast (I-JEPA model)
  • imagegptImageGPTImageProcessor (ImageGPT model)
  • instructblipBlipImageProcessor or BlipImageProcessorFast (InstructBLIP model)
  • instructblipvideoInstructBlipVideoImageProcessor (InstructBlipVideo model)
  • janusJanusImageProcessor or JanusImageProcessorFast (Janus model)
  • kosmos-2CLIPImageProcessor or CLIPImageProcessorFast (KOSMOS-2 model)
  • kosmos-2.5Kosmos2_5ImageProcessor or Kosmos2_5ImageProcessorFast (KOSMOS-2.5 model)
  • layoutlmv2LayoutLMv2ImageProcessor or LayoutLMv2ImageProcessorFast (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3ImageProcessor or LayoutLMv3ImageProcessorFast (LayoutLMv3 model)
  • levitLevitImageProcessor or LevitImageProcessorFast (LeViT model)
  • lightglueLightGlueImageProcessor (LightGlue model)
  • llama4Llama4ImageProcessor or Llama4ImageProcessorFast (Llama4 model)
  • llavaLlavaImageProcessor or LlavaImageProcessorFast (LLaVa model)
  • llava_nextLlavaNextImageProcessor or LlavaNextImageProcessorFast (LLaVA-NeXT model)
  • llava_next_videoLlavaNextVideoImageProcessor (LLaVa-NeXT-Video model)
  • llava_onevisionLlavaOnevisionImageProcessor or LlavaOnevisionImageProcessorFast (LLaVA-Onevision model)
  • mask2formerMask2FormerImageProcessor or Mask2FormerImageProcessorFast (Mask2Former model)
  • maskformerMaskFormerImageProcessor or MaskFormerImageProcessorFast (MaskFormer model)
  • metaclip_2CLIPImageProcessor or CLIPImageProcessorFast (MetaCLIP 2 model)
  • mgp-strViTImageProcessor or ViTImageProcessorFast (MGP-STR model)
  • mistral3PixtralImageProcessor or PixtralImageProcessorFast (Mistral3 model)
  • mlcdCLIPImageProcessor or CLIPImageProcessorFast (MLCD model)
  • mllamaMllamaImageProcessor (Mllama model)
  • mm-grounding-dinoGroundingDinoImageProcessor or GroundingDinoImageProcessorFast (MM Grounding DINO model)
  • mobilenet_v1MobileNetV1ImageProcessor or MobileNetV1ImageProcessorFast (MobileNetV1 model)
  • mobilenet_v2MobileNetV2ImageProcessor or MobileNetV2ImageProcessorFast (MobileNetV2 model)
  • mobilevitMobileViTImageProcessor or MobileViTImageProcessorFast (MobileViT model)
  • mobilevitv2MobileViTImageProcessor or MobileViTImageProcessorFast (MobileViTV2 model)
  • natViTImageProcessor or ViTImageProcessorFast (NAT model)
  • nougatNougatImageProcessor or NougatImageProcessorFast (Nougat model)
  • oneformerOneFormerImageProcessor or OneFormerImageProcessorFast (OneFormer model)
  • ovis2Ovis2ImageProcessor or Ovis2ImageProcessorFast (Ovis2 model)
  • owlv2Owlv2ImageProcessor or Owlv2ImageProcessorFast (OWLv2 model)
  • owlvitOwlViTImageProcessor or OwlViTImageProcessorFast (OWL-ViT model)
  • paligemmaSiglipImageProcessor or SiglipImageProcessorFast (PaliGemma model)
  • perceiverPerceiverImageProcessor or PerceiverImageProcessorFast (Perceiver model)
  • perception_lmPerceptionLMImageProcessorFast (PerceptionLM model)
  • phi4_multimodalPhi4MultimodalImageProcessorFast (Phi4Multimodal model)
  • pix2structPix2StructImageProcessor (Pix2Struct model)
  • pixtralPixtralImageProcessor or PixtralImageProcessorFast (Pixtral model)
  • poolformerPoolFormerImageProcessor or PoolFormerImageProcessorFast (PoolFormer model)
  • prompt_depth_anythingPromptDepthAnythingImageProcessor (PromptDepthAnything model)
  • pvtPvtImageProcessor or PvtImageProcessorFast (PVT model)
  • pvt_v2PvtImageProcessor or PvtImageProcessorFast (PVTv2 model)
  • qwen2_5_vlQwen2VLImageProcessor or Qwen2VLImageProcessorFast (Qwen2_5_VL model)
  • qwen2_vlQwen2VLImageProcessor or Qwen2VLImageProcessorFast (Qwen2VL model)
  • regnetConvNextImageProcessor or ConvNextImageProcessorFast (RegNet model)
  • resnetConvNextImageProcessor or ConvNextImageProcessorFast (ResNet model)
  • rt_detrRTDetrImageProcessor or RTDetrImageProcessorFast (RT-DETR model)
  • samSamImageProcessor or SamImageProcessorFast (SAM model)
  • sam2Sam2ImageProcessorFast (SAM2 model)
  • sam_hqSamImageProcessor or SamImageProcessorFast (SAM-HQ model)
  • segformerSegformerImageProcessor or SegformerImageProcessorFast (SegFormer model)
  • seggptSegGptImageProcessor (SegGPT model)
  • shieldgemma2Gemma3ImageProcessor or Gemma3ImageProcessorFast (Shieldgemma2 model)
  • siglipSiglipImageProcessor or SiglipImageProcessorFast (SigLIP model)
  • siglip2Siglip2ImageProcessor or Siglip2ImageProcessorFast (SigLIP2 model)
  • smolvlmSmolVLMImageProcessor or SmolVLMImageProcessorFast (SmolVLM model)
  • superglueSuperGlueImageProcessor (SuperGlue model)
  • superpointSuperPointImageProcessor or SuperPointImageProcessorFast (SuperPoint model)
  • swiftformerViTImageProcessor or ViTImageProcessorFast (SwiftFormer model)
  • swinViTImageProcessor or ViTImageProcessorFast (Swin Transformer model)
  • swin2srSwin2SRImageProcessor or Swin2SRImageProcessorFast (Swin2SR model)
  • swinv2ViTImageProcessor or ViTImageProcessorFast (Swin Transformer V2 model)
  • table-transformerDetrImageProcessor or DetrImageProcessorFast (Table Transformer model)
  • textnetTextNetImageProcessor or TextNetImageProcessorFast (TextNet model)
  • timesformerVideoMAEImageProcessor (TimeSformer model)
  • timm_wrapperTimmWrapperImageProcessor (TimmWrapperModel model)
  • tvltTvltImageProcessor (TVLT model)
  • tvpTvpImageProcessor or TvpImageProcessorFast (TVP model)
  • udopLayoutLMv3ImageProcessor or LayoutLMv3ImageProcessorFast (UDOP model)
  • upernetSegformerImageProcessor or SegformerImageProcessorFast (UPerNet model)
  • vanConvNextImageProcessor or ConvNextImageProcessorFast (VAN model)
  • videomaeVideoMAEImageProcessor (VideoMAE model)
  • viltViltImageProcessor or ViltImageProcessorFast (ViLT model)
  • vipllavaCLIPImageProcessor or CLIPImageProcessorFast (VipLlava model)
  • vitViTImageProcessor or ViTImageProcessorFast (ViT model)
  • vit_hybridViTHybridImageProcessor (ViT Hybrid model)
  • vit_maeViTImageProcessor or ViTImageProcessorFast (ViTMAE model)
  • vit_msnViTImageProcessor or ViTImageProcessorFast (ViTMSN model)
  • vitmatteVitMatteImageProcessor or VitMatteImageProcessorFast (ViTMatte model)
  • xclipCLIPImageProcessor or CLIPImageProcessorFast (X-CLIP model)
  • yolosYolosImageProcessor or YolosImageProcessorFast (YOLOS model)
  • zoedepthZoeDepthImageProcessor or ZoeDepthImageProcessorFast (ZoeDepth model)

Passing token=True is required when you want to use a private model.

Examples:

>>> from transformers import AutoImageProcessor

>>> # Download image processor from huggingface.co and cache.
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")

>>> # If image processor files are in a directory (e.g. image processor was saved using *save_pretrained('./test/saved_model/')*)
>>> # image_processor = AutoImageProcessor.from_pretrained("./test/saved_model/")

register

< >

( config_class image_processor_class = None slow_image_processor_class = None fast_image_processor_class = None exist_ok = False )

Parameters

Register a new image processor for this class.

AutoProcessor

class transformers.AutoProcessor

< >

( )

This is a generic processor class that will be instantiated as one of the processor classes of the library when created with the AutoProcessor.from_pretrained() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_pretrained

< >

( pretrained_model_name_or_path **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — This can be either:

    • a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
    • a path to a directory containing a processor files saved using the save_pretrained() method, e.g., ./my_model_directory/.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.
  • force_download (bool, optional, defaults to False) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running hf auth login (stored in ~/.huggingface).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • return_unused_kwargs (bool, optional, defaults to False) — If False, then this function returns just the final feature extractor object. If True, then this functions returns a Tuple(feature_extractor, unused_kwargs) where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part of kwargs which has not been used to update feature_extractor and is otherwise ignored.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • kwargs (dict[str, Any], optional) — The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by the return_unused_kwargs keyword parameter.

Instantiate one of the processor classes of the library from a pretrained model vocabulary.

The processor class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible):

  • aimv2CLIPProcessor (AIMv2 model)
  • alignAlignProcessor (ALIGN model)
  • altclipAltCLIPProcessor (AltCLIP model)
  • ariaAriaProcessor (Aria model)
  • aya_visionAyaVisionProcessor (AyaVision model)
  • barkBarkProcessor (Bark model)
  • blipBlipProcessor (BLIP model)
  • blip-2Blip2Processor (BLIP-2 model)
  • bridgetowerBridgeTowerProcessor (BridgeTower model)
  • chameleonChameleonProcessor (Chameleon model)
  • chinese_clipChineseCLIPProcessor (Chinese-CLIP model)
  • clapClapProcessor (CLAP model)
  • clipCLIPProcessor (CLIP model)
  • clipsegCLIPSegProcessor (CLIPSeg model)
  • clvpClvpProcessor (CLVP model)
  • cohere2_visionCohere2VisionProcessor (Cohere2Vision model)
  • colpaliColPaliProcessor (ColPali model)
  • colqwen2ColQwen2Processor (ColQwen2 model)
  • deepseek_vlDeepseekVLProcessor (DeepseekVL model)
  • deepseek_vl_hybridDeepseekVLHybridProcessor (DeepseekVLHybrid model)
  • diaDiaProcessor (Dia model)
  • emu3Emu3Processor (Emu3 model)
  • evollaEvollaProcessor (Evolla model)
  • flavaFlavaProcessor (FLAVA model)
  • florence2Florence2Processor (Florence2 model)
  • fuyuFuyuProcessor (Fuyu model)
  • gemma3Gemma3Processor (Gemma3ForConditionalGeneration model)
  • gemma3nGemma3nProcessor (Gemma3nForConditionalGeneration model)
  • gitGitProcessor (GIT model)
  • glm4vGlm4vProcessor (GLM4V model)
  • glm4v_moeGlm4vProcessor (GLM4VMOE model)
  • got_ocr2GotOcr2Processor (GOT-OCR2 model)
  • granite_speechGraniteSpeechProcessor (GraniteSpeech model)
  • grounding-dinoGroundingDinoProcessor (Grounding DINO model)
  • groupvitCLIPProcessor (GroupViT model)
  • hubertWav2Vec2Processor (Hubert model)
  • ideficsIdeficsProcessor (IDEFICS model)
  • idefics2Idefics2Processor (Idefics2 model)
  • idefics3Idefics3Processor (Idefics3 model)
  • instructblipInstructBlipProcessor (InstructBLIP model)
  • instructblipvideoInstructBlipVideoProcessor (InstructBlipVideo model)
  • internvlInternVLProcessor (InternVL model)
  • janusJanusProcessor (Janus model)
  • kosmos-2Kosmos2Processor (KOSMOS-2 model)
  • kosmos-2.5Kosmos2_5Processor (KOSMOS-2.5 model)
  • kyutai_speech_to_textKyutaiSpeechToTextProcessor (KyutaiSpeechToText model)
  • layoutlmv2LayoutLMv2Processor (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3Processor (LayoutLMv3 model)
  • llama4Llama4Processor (Llama4 model)
  • llavaLlavaProcessor (LLaVa model)
  • llava_nextLlavaNextProcessor (LLaVA-NeXT model)
  • llava_next_videoLlavaNextVideoProcessor (LLaVa-NeXT-Video model)
  • llava_onevisionLlavaOnevisionProcessor (LLaVA-Onevision model)
  • markuplmMarkupLMProcessor (MarkupLM model)
  • mctctMCTCTProcessor (M-CTC-T model)
  • metaclip_2CLIPProcessor (MetaCLIP 2 model)
  • mgp-strMgpstrProcessor (MGP-STR model)
  • mistral3PixtralProcessor (Mistral3 model)
  • mllamaMllamaProcessor (Mllama model)
  • mm-grounding-dinoGroundingDinoProcessor (MM Grounding DINO model)
  • moonshineWav2Vec2Processor (Moonshine model)
  • oneformerOneFormerProcessor (OneFormer model)
  • ovis2Ovis2Processor (Ovis2 model)
  • owlv2Owlv2Processor (OWLv2 model)
  • owlvitOwlViTProcessor (OWL-ViT model)
  • paligemmaPaliGemmaProcessor (PaliGemma model)
  • perception_lmPerceptionLMProcessor (PerceptionLM model)
  • phi4_multimodalPhi4MultimodalProcessor (Phi4Multimodal model)
  • pix2structPix2StructProcessor (Pix2Struct model)
  • pixtralPixtralProcessor (Pixtral model)
  • pop2pianoPop2PianoProcessor (Pop2Piano model)
  • qwen2_5_omniQwen2_5OmniProcessor (Qwen2_5Omni model)
  • qwen2_5_vlQwen2_5_VLProcessor (Qwen2_5_VL model)
  • qwen2_audioQwen2AudioProcessor (Qwen2Audio model)
  • qwen2_vlQwen2VLProcessor (Qwen2VL model)
  • samSamProcessor (SAM model)
  • sam2Sam2Processor (SAM2 model)
  • sam_hqSamHQProcessor (SAM-HQ model)
  • seamless_m4tSeamlessM4TProcessor (SeamlessM4T model)
  • sewWav2Vec2Processor (SEW model)
  • sew-dWav2Vec2Processor (SEW-D model)
  • shieldgemma2ShieldGemma2Processor (Shieldgemma2 model)
  • siglipSiglipProcessor (SigLIP model)
  • siglip2Siglip2Processor (SigLIP2 model)
  • smolvlmSmolVLMProcessor (SmolVLM model)
  • speech_to_textSpeech2TextProcessor (Speech2Text model)
  • speech_to_text_2Speech2Text2Processor (Speech2Text2 model)
  • speecht5SpeechT5Processor (SpeechT5 model)
  • trocrTrOCRProcessor (TrOCR model)
  • tvltTvltProcessor (TVLT model)
  • tvpTvpProcessor (TVP model)
  • udopUdopProcessor (UDOP model)
  • unispeechWav2Vec2Processor (UniSpeech model)
  • unispeech-satWav2Vec2Processor (UniSpeechSat model)
  • video_llavaVideoLlavaProcessor (VideoLlava model)
  • viltViltProcessor (ViLT model)
  • vipllavaLlavaProcessor (VipLlava model)
  • vision-text-dual-encoderVisionTextDualEncoderProcessor (VisionTextDualEncoder model)
  • voxtralVoxtralProcessor (Voxtral model)
  • wav2vec2Wav2Vec2Processor (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2Processor (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2Processor (Wav2Vec2-Conformer model)
  • wavlmWav2Vec2Processor (WavLM model)
  • whisperWhisperProcessor (Whisper model)
  • xclipXCLIPProcessor (X-CLIP model)

Passing token=True is required when you want to use a private model.

Examples:

>>> from transformers import AutoProcessor

>>> # Download processor from huggingface.co and cache.
>>> processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base-960h")

>>> # If processor files are in a directory (e.g. processor was saved using *save_pretrained('./test/saved_model/')*)
>>> # processor = AutoProcessor.from_pretrained("./test/saved_model/")

register

< >

( config_class processor_class exist_ok = False )

Parameters

  • config_class (PretrainedConfig) — The configuration corresponding to the model to register.
  • processor_class (ProcessorMixin) — The processor to register.

Register a new processor for this class.

Generic model classes

以下の自動クラスは、特定のヘッドを持たないベースモデルクラスをインスタンス化するために利用可能です。

AutoModel

class transformers.AutoModel

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • ASTConfig configuration class: ASTModel (Audio Spectrogram Transformer model)
    • Aimv2Config configuration class: Aimv2Model (AIMv2 model)
    • Aimv2VisionConfig configuration class: Aimv2VisionModel (Aimv2VisionModel model)
    • AlbertConfig configuration class: AlbertModel (ALBERT model)
    • AlignConfig configuration class: AlignModel (ALIGN model)
    • AltCLIPConfig configuration class: AltCLIPModel (AltCLIP model)
    • ApertusConfig configuration class: ApertusModel (Apertus model)
    • ArceeConfig configuration class: ArceeModel (Arcee model)
    • AriaConfig configuration class: AriaModel (Aria model)
    • AriaTextConfig configuration class: AriaTextModel (AriaText model)
    • AutoformerConfig configuration class: AutoformerModel (Autoformer model)
    • AyaVisionConfig configuration class: AyaVisionModel (AyaVision model)
    • BambaConfig configuration class: BambaModel (Bamba model)
    • BarkConfig configuration class: BarkModel (Bark model)
    • BartConfig configuration class: BartModel (BART model)
    • BeitConfig configuration class: BeitModel (BEiT model)
    • BertConfig configuration class: BertModel (BERT model)
    • BertGenerationConfig configuration class: BertGenerationEncoder (Bert Generation model)
    • BigBirdConfig configuration class: BigBirdModel (BigBird model)
    • BigBirdPegasusConfig configuration class: BigBirdPegasusModel (BigBird-Pegasus model)
    • BioGptConfig configuration class: BioGptModel (BioGpt model)
    • BitConfig configuration class: BitModel (BiT model)
    • BitNetConfig configuration class: BitNetModel (BitNet model)
    • BlenderbotConfig configuration class: BlenderbotModel (Blenderbot model)
    • BlenderbotSmallConfig configuration class: BlenderbotSmallModel (BlenderbotSmall model)
    • Blip2Config configuration class: Blip2Model (BLIP-2 model)
    • Blip2QFormerConfig configuration class: Blip2QFormerModel (BLIP-2 QFormer model)
    • BlipConfig configuration class: BlipModel (BLIP model)
    • BloomConfig configuration class: BloomModel (BLOOM model)
    • BridgeTowerConfig configuration class: BridgeTowerModel (BridgeTower model)
    • BrosConfig configuration class: BrosModel (BROS model)
    • CLIPConfig configuration class: CLIPModel (CLIP model)
    • CLIPSegConfig configuration class: CLIPSegModel (CLIPSeg model)
    • CLIPTextConfig configuration class: CLIPTextModel (CLIPTextModel model)
    • CLIPVisionConfig configuration class: CLIPVisionModel (CLIPVisionModel model)
    • CTRLConfig configuration class: CTRLModel (CTRL model)
    • CamembertConfig configuration class: CamembertModel (CamemBERT model)
    • CanineConfig configuration class: CanineModel (CANINE model)
    • ChameleonConfig configuration class: ChameleonModel (Chameleon model)
    • ChineseCLIPConfig configuration class: ChineseCLIPModel (Chinese-CLIP model)
    • ChineseCLIPVisionConfig configuration class: ChineseCLIPVisionModel (ChineseCLIPVisionModel model)
    • ClapConfig configuration class: ClapModel (CLAP model)
    • ClvpConfig configuration class: ClvpModelForConditionalGeneration (CLVP model)
    • CodeGenConfig configuration class: CodeGenModel (CodeGen model)
    • Cohere2Config configuration class: Cohere2Model (Cohere2 model)
    • Cohere2VisionConfig configuration class: Cohere2VisionModel (Cohere2Vision model)
    • CohereConfig configuration class: CohereModel (Cohere model)
    • ConditionalDetrConfig configuration class: ConditionalDetrModel (Conditional DETR model)
    • ConvBertConfig configuration class: ConvBertModel (ConvBERT model)
    • ConvNextConfig configuration class: ConvNextModel (ConvNeXT model)
    • ConvNextV2Config configuration class: ConvNextV2Model (ConvNeXTV2 model)
    • CpmAntConfig configuration class: CpmAntModel (CPM-Ant model)
    • CsmConfig configuration class: CsmForConditionalGeneration (CSM model)
    • CvtConfig configuration class: CvtModel (CvT model)
    • DFineConfig configuration class: DFineModel (D-FINE model)
    • DINOv3ConvNextConfig configuration class: DINOv3ConvNextModel (DINOv3 ConvNext model)
    • DINOv3ViTConfig configuration class: DINOv3ViTModel (DINOv3 ViT model)
    • DPRConfig configuration class: DPRQuestionEncoder (DPR model)
    • DPTConfig configuration class: DPTModel (DPT model)
    • DabDetrConfig configuration class: DabDetrModel (DAB-DETR model)
    • DacConfig configuration class: DacModel (DAC model)
    • Data2VecAudioConfig configuration class: Data2VecAudioModel (Data2VecAudio model)
    • Data2VecTextConfig configuration class: Data2VecTextModel (Data2VecText model)
    • Data2VecVisionConfig configuration class: Data2VecVisionModel (Data2VecVision model)
    • DbrxConfig configuration class: DbrxModel (DBRX model)
    • DebertaConfig configuration class: DebertaModel (DeBERTa model)
    • DebertaV2Config configuration class: DebertaV2Model (DeBERTa-v2 model)
    • DecisionTransformerConfig configuration class: DecisionTransformerModel (Decision Transformer model)
    • DeepseekV2Config configuration class: DeepseekV2Model (DeepSeek-V2 model)
    • DeepseekV3Config configuration class: DeepseekV3Model (DeepSeek-V3 model)
    • DeepseekVLConfig configuration class: DeepseekVLModel (DeepseekVL model)
    • DeepseekVLHybridConfig configuration class: DeepseekVLHybridModel (DeepseekVLHybrid model)
    • DeformableDetrConfig configuration class: DeformableDetrModel (Deformable DETR model)
    • DeiTConfig configuration class: DeiTModel (DeiT model)
    • DepthProConfig configuration class: DepthProModel (DepthPro model)
    • DetaConfig configuration class: DetaModel (DETA model)
    • DetrConfig configuration class: DetrModel (DETR model)
    • DiaConfig configuration class: DiaModel (Dia model)
    • DiffLlamaConfig configuration class: DiffLlamaModel (DiffLlama model)
    • DinatConfig configuration class: DinatModel (DiNAT model)
    • Dinov2Config configuration class: Dinov2Model (DINOv2 model)
    • Dinov2WithRegistersConfig configuration class: Dinov2WithRegistersModel (DINOv2 with Registers model)
    • DistilBertConfig configuration class: DistilBertModel (DistilBERT model)
    • DogeConfig configuration class: DogeModel (Doge model)
    • DonutSwinConfig configuration class: DonutSwinModel (DonutSwin model)
    • Dots1Config configuration class: Dots1Model (dots1 model)
    • EfficientFormerConfig configuration class: EfficientFormerModel (EfficientFormer model)
    • EfficientLoFTRConfig configuration class: EfficientLoFTRModel (EfficientLoFTR model)
    • EfficientNetConfig configuration class: EfficientNetModel (EfficientNet model)
    • ElectraConfig configuration class: ElectraModel (ELECTRA model)
    • Emu3Config configuration class: Emu3Model (Emu3 model)
    • EncodecConfig configuration class: EncodecModel (EnCodec model)
    • Ernie4_5Config configuration class: Ernie4_5Model (Ernie4_5 model)
    • Ernie4_5_MoeConfig configuration class: Ernie4_5_MoeModel (Ernie4_5_MoE model)
    • ErnieConfig configuration class: ErnieModel (ERNIE model)
    • ErnieMConfig configuration class: ErnieMModel (ErnieM model)
    • EsmConfig configuration class: EsmModel (ESM model)
    • EvollaConfig configuration class: EvollaModel (Evolla model)
    • Exaone4Config configuration class: Exaone4Model (EXAONE-4.0 model)
    • FNetConfig configuration class: FNetModel (FNet model)
    • FSMTConfig configuration class: FSMTModel (FairSeq Machine-Translation model)
    • FalconConfig configuration class: FalconModel (Falcon model)
    • FalconH1Config configuration class: FalconH1Model (FalconH1 model)
    • FalconMambaConfig configuration class: FalconMambaModel (FalconMamba model)
    • FastSpeech2ConformerConfig configuration class: FastSpeech2ConformerModel (FastSpeech2Conformer model)
    • FastSpeech2ConformerWithHifiGanConfig configuration class: FastSpeech2ConformerWithHifiGan (FastSpeech2ConformerWithHifiGan model)
    • FlaubertConfig configuration class: FlaubertModel (FlauBERT model)
    • FlavaConfig configuration class: FlavaModel (FLAVA model)
    • Florence2Config configuration class: Florence2Model (Florence2 model)
    • FocalNetConfig configuration class: FocalNetModel (FocalNet model)
    • FunnelConfig configuration class: FunnelModel or FunnelBaseModel (Funnel Transformer model)
    • FuyuConfig configuration class: FuyuModel (Fuyu model)
    • GLPNConfig configuration class: GLPNModel (GLPN model)
    • GPT2Config configuration class: GPT2Model (OpenAI GPT-2 model)
    • GPTBigCodeConfig configuration class: GPTBigCodeModel (GPTBigCode model)
    • GPTJConfig configuration class: GPTJModel (GPT-J model)
    • GPTNeoConfig configuration class: GPTNeoModel (GPT Neo model)
    • GPTNeoXConfig configuration class: GPTNeoXModel (GPT NeoX model)
    • GPTNeoXJapaneseConfig configuration class: GPTNeoXJapaneseModel (GPT NeoX Japanese model)
    • GPTSanJapaneseConfig configuration class: GPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
    • Gemma2Config configuration class: Gemma2Model (Gemma2 model)
    • Gemma3Config configuration class: Gemma3Model (Gemma3ForConditionalGeneration model)
    • Gemma3TextConfig configuration class: Gemma3TextModel (Gemma3ForCausalLM model)
    • Gemma3nAudioConfig configuration class: Gemma3nAudioEncoder (Gemma3nAudioEncoder model)
    • Gemma3nConfig configuration class: Gemma3nModel (Gemma3nForConditionalGeneration model)
    • Gemma3nTextConfig configuration class: Gemma3nTextModel (Gemma3nForCausalLM model)
    • Gemma3nVisionConfig configuration class: TimmWrapperModel (TimmWrapperModel model)
    • GemmaConfig configuration class: GemmaModel (Gemma model)
    • GitConfig configuration class: GitModel (GIT model)
    • Glm4Config configuration class: Glm4Model (GLM4 model)
    • Glm4MoeConfig configuration class: Glm4MoeModel (Glm4MoE model)
    • Glm4vConfig configuration class: Glm4vModel (GLM4V model)
    • Glm4vMoeConfig configuration class: Glm4vMoeModel (GLM4VMOE model)
    • Glm4vMoeTextConfig configuration class: Glm4vMoeTextModel (GLM4VMOE model)
    • Glm4vTextConfig configuration class: Glm4vTextModel (GLM4V model)
    • GlmConfig configuration class: GlmModel (GLM model)
    • GotOcr2Config configuration class: GotOcr2Model (GOT-OCR2 model)
    • GptOssConfig configuration class: GptOssModel (GptOss model)
    • GraniteConfig configuration class: GraniteModel (Granite model)
    • GraniteMoeConfig configuration class: GraniteMoeModel (GraniteMoeMoe model)
    • GraniteMoeHybridConfig configuration class: GraniteMoeHybridModel (GraniteMoeHybrid model)
    • GraniteMoeSharedConfig configuration class: GraniteMoeSharedModel (GraniteMoeSharedMoe model)
    • GraphormerConfig configuration class: GraphormerModel (Graphormer model)
    • GroundingDinoConfig configuration class: GroundingDinoModel (Grounding DINO model)
    • GroupViTConfig configuration class: GroupViTModel (GroupViT model)
    • HGNetV2Config configuration class: HGNetV2Backbone (HGNet-V2 model)
    • HeliumConfig configuration class: HeliumModel (Helium model)
    • HieraConfig configuration class: HieraModel (Hiera model)
    • HubertConfig configuration class: HubertModel (Hubert model)
    • HunYuanDenseV1Config configuration class: HunYuanDenseV1Model (HunYuanDenseV1 model)
    • HunYuanMoEV1Config configuration class: HunYuanMoEV1Model (HunYuanMoeV1 model)
    • IBertConfig configuration class: IBertModel (I-BERT model)
    • IJepaConfig configuration class: IJepaModel (I-JEPA model)
    • Idefics2Config configuration class: Idefics2Model (Idefics2 model)
    • Idefics3Config configuration class: Idefics3Model (Idefics3 model)
    • Idefics3VisionConfig configuration class: Idefics3VisionTransformer (Idefics3VisionTransformer model)
    • IdeficsConfig configuration class: IdeficsModel (IDEFICS model)
    • ImageGPTConfig configuration class: ImageGPTModel (ImageGPT model)
    • InformerConfig configuration class: InformerModel (Informer model)
    • InstructBlipConfig configuration class: InstructBlipModel (InstructBLIP model)
    • InstructBlipVideoConfig configuration class: InstructBlipVideoModel (InstructBlipVideo model)
    • InternVLConfig configuration class: InternVLModel (InternVL model)
    • InternVLVisionConfig configuration class: InternVLVisionModel (InternVLVision model)
    • JambaConfig configuration class: JambaModel (Jamba model)
    • JanusConfig configuration class: JanusModel (Janus model)
    • JetMoeConfig configuration class: JetMoeModel (JetMoe model)
    • JukeboxConfig configuration class: JukeboxModel (Jukebox model)
    • Kosmos2Config configuration class: Kosmos2Model (KOSMOS-2 model)
    • Kosmos2_5Config configuration class: Kosmos2_5Model (KOSMOS-2.5 model)
    • KyutaiSpeechToTextConfig configuration class: KyutaiSpeechToTextModel (KyutaiSpeechToText model)
    • LEDConfig configuration class: LEDModel (LED model)
    • LayoutLMConfig configuration class: LayoutLMModel (LayoutLM model)
    • LayoutLMv2Config configuration class: LayoutLMv2Model (LayoutLMv2 model)
    • LayoutLMv3Config configuration class: LayoutLMv3Model (LayoutLMv3 model)
    • LevitConfig configuration class: LevitModel (LeViT model)
    • Lfm2Config configuration class: Lfm2Model (Lfm2 model)
    • LightGlueConfig configuration class: LightGlueForKeypointMatching (LightGlue model)
    • LiltConfig configuration class: LiltModel (LiLT model)
    • Llama4Config configuration class: Llama4ForConditionalGeneration (Llama4 model)
    • Llama4TextConfig configuration class: Llama4TextModel (Llama4ForCausalLM model)
    • LlamaConfig configuration class: LlamaModel (LLaMA model)
    • LlavaConfig configuration class: LlavaModel (LLaVa model)
    • LlavaNextConfig configuration class: LlavaNextModel (LLaVA-NeXT model)
    • LlavaNextVideoConfig configuration class: LlavaNextVideoModel (LLaVa-NeXT-Video model)
    • LlavaOnevisionConfig configuration class: LlavaOnevisionModel (LLaVA-Onevision model)
    • LongT5Config configuration class: LongT5Model (LongT5 model)
    • LongformerConfig configuration class: LongformerModel (Longformer model)
    • LukeConfig configuration class: LukeModel (LUKE model)
    • LxmertConfig configuration class: LxmertModel (LXMERT model)
    • M2M100Config configuration class: M2M100Model (M2M100 model)
    • MBartConfig configuration class: MBartModel (mBART model)
    • MCTCTConfig configuration class: MCTCTModel (M-CTC-T model)
    • MLCDVisionConfig configuration class: MLCDVisionModel (MLCD model)
    • MMGroundingDinoConfig configuration class: MMGroundingDinoModel (MM Grounding DINO model)
    • MPNetConfig configuration class: MPNetModel (MPNet model)
    • MT5Config configuration class: MT5Model (MT5 model)
    • Mamba2Config configuration class: Mamba2Model (mamba2 model)
    • MambaConfig configuration class: MambaModel (Mamba model)
    • MarianConfig configuration class: MarianModel (Marian model)
    • MarkupLMConfig configuration class: MarkupLMModel (MarkupLM model)
    • Mask2FormerConfig configuration class: Mask2FormerModel (Mask2Former model)
    • MaskFormerConfig configuration class: MaskFormerModel (MaskFormer model)
    • MaskFormerSwinConfig configuration class: MaskFormerSwinModel (MaskFormerSwin model)
    • MegaConfig configuration class: MegaModel (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertModel (Megatron-BERT model)
    • MetaClip2Config configuration class: MetaClip2Model (MetaCLIP 2 model)
    • MgpstrConfig configuration class: MgpstrForSceneTextRecognition (MGP-STR model)
    • MimiConfig configuration class: MimiModel (Mimi model)
    • MiniMaxConfig configuration class: MiniMaxModel (MiniMax model)
    • Mistral3Config configuration class: Mistral3Model (Mistral3 model)
    • MistralConfig configuration class: MistralModel (Mistral model)
    • MixtralConfig configuration class: MixtralModel (Mixtral model)
    • MllamaConfig configuration class: MllamaModel (Mllama model)
    • MobileBertConfig configuration class: MobileBertModel (MobileBERT model)
    • MobileNetV1Config configuration class: MobileNetV1Model (MobileNetV1 model)
    • MobileNetV2Config configuration class: MobileNetV2Model (MobileNetV2 model)
    • MobileViTConfig configuration class: MobileViTModel (MobileViT model)
    • MobileViTV2Config configuration class: MobileViTV2Model (MobileViTV2 model)
    • ModernBertConfig configuration class: ModernBertModel (ModernBERT model)
    • ModernBertDecoderConfig configuration class: ModernBertDecoderModel (ModernBertDecoder model)
    • MoonshineConfig configuration class: MoonshineModel (Moonshine model)
    • MoshiConfig configuration class: MoshiModel (Moshi model)
    • MptConfig configuration class: MptModel (MPT model)
    • MraConfig configuration class: MraModel (MRA model)
    • MusicgenConfig configuration class: MusicgenModel (MusicGen model)
    • MusicgenMelodyConfig configuration class: MusicgenMelodyModel (MusicGen Melody model)
    • MvpConfig configuration class: MvpModel (MVP model)
    • NatConfig configuration class: NatModel (NAT model)
    • NemotronConfig configuration class: NemotronModel (Nemotron model)
    • NezhaConfig configuration class: NezhaModel (Nezha model)
    • NllbMoeConfig configuration class: NllbMoeModel (NLLB-MOE model)
    • NystromformerConfig configuration class: NystromformerModel (Nyströmformer model)
    • OPTConfig configuration class: OPTModel (OPT model)
    • Olmo2Config configuration class: Olmo2Model (OLMo2 model)
    • OlmoConfig configuration class: OlmoModel (OLMo model)
    • OlmoeConfig configuration class: OlmoeModel (OLMoE model)
    • OmDetTurboConfig configuration class: OmDetTurboForObjectDetection (OmDet-Turbo model)
    • OneFormerConfig configuration class: OneFormerModel (OneFormer model)
    • OpenAIGPTConfig configuration class: OpenAIGPTModel (OpenAI GPT model)
    • OpenLlamaConfig configuration class: OpenLlamaModel (OpenLlama model)
    • Ovis2Config configuration class: Ovis2Model (Ovis2 model)
    • OwlViTConfig configuration class: OwlViTModel (OWL-ViT model)
    • Owlv2Config configuration class: Owlv2Model (OWLv2 model)
    • PLBartConfig configuration class: PLBartModel (PLBart model)
    • PaliGemmaConfig configuration class: PaliGemmaModel (PaliGemma model)
    • PatchTSMixerConfig configuration class: PatchTSMixerModel (PatchTSMixer model)
    • PatchTSTConfig configuration class: PatchTSTModel (PatchTST model)
    • PegasusConfig configuration class: PegasusModel (Pegasus model)
    • PegasusXConfig configuration class: PegasusXModel (PEGASUS-X model)
    • PerceiverConfig configuration class: PerceiverModel (Perceiver model)
    • PerceptionLMConfig configuration class: PerceptionLMModel (PerceptionLM model)
    • PersimmonConfig configuration class: PersimmonModel (Persimmon model)
    • Phi3Config configuration class: Phi3Model (Phi3 model)
    • Phi4MultimodalConfig configuration class: Phi4MultimodalModel (Phi4Multimodal model)
    • PhiConfig configuration class: PhiModel (Phi model)
    • PhimoeConfig configuration class: PhimoeModel (Phimoe model)
    • PixtralVisionConfig configuration class: PixtralVisionModel (Pixtral model)
    • PoolFormerConfig configuration class: PoolFormerModel (PoolFormer model)
    • ProphetNetConfig configuration class: ProphetNetModel (ProphetNet model)
    • PvtConfig configuration class: PvtModel (PVT model)
    • PvtV2Config configuration class: PvtV2Model (PVTv2 model)
    • QDQBertConfig configuration class: QDQBertModel (QDQBert model)
    • Qwen2AudioEncoderConfig configuration class: Qwen2AudioEncoder (Qwen2AudioEncoder model)
    • Qwen2Config configuration class: Qwen2Model (Qwen2 model)
    • Qwen2MoeConfig configuration class: Qwen2MoeModel (Qwen2MoE model)
    • Qwen2VLConfig configuration class: Qwen2VLModel (Qwen2VL model)
    • Qwen2VLTextConfig configuration class: Qwen2VLTextModel (Qwen2VL model)
    • Qwen2_5_VLConfig configuration class: Qwen2_5_VLModel (Qwen2_5_VL model)
    • Qwen2_5_VLTextConfig configuration class: Qwen2_5_VLTextModel (Qwen2_5_VL model)
    • Qwen3Config configuration class: Qwen3Model (Qwen3 model)
    • Qwen3MoeConfig configuration class: Qwen3MoeModel (Qwen3MoE model)
    • RTDetrConfig configuration class: RTDetrModel (RT-DETR model)
    • RTDetrV2Config configuration class: RTDetrV2Model (RT-DETRv2 model)
    • RecurrentGemmaConfig configuration class: RecurrentGemmaModel (RecurrentGemma model)
    • ReformerConfig configuration class: ReformerModel (Reformer model)
    • RegNetConfig configuration class: RegNetModel (RegNet model)
    • RemBertConfig configuration class: RemBertModel (RemBERT model)
    • ResNetConfig configuration class: ResNetModel (ResNet model)
    • RetriBertConfig configuration class: RetriBertModel (RetriBERT model)
    • RoCBertConfig configuration class: RoCBertModel (RoCBert model)
    • RoFormerConfig configuration class: RoFormerModel (RoFormer model)
    • RobertaConfig configuration class: RobertaModel (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
    • RwkvConfig configuration class: RwkvModel (RWKV model)
    • SEWConfig configuration class: SEWModel (SEW model)
    • SEWDConfig configuration class: SEWDModel (SEW-D model)
    • Sam2Config configuration class: Sam2Model (SAM2 model)
    • Sam2HieraDetConfig configuration class: Sam2HieraDetModel (Sam2HieraDetModel model)
    • Sam2VideoConfig configuration class: Sam2VideoModel (Sam2VideoModel model)
    • Sam2VisionConfig configuration class: Sam2VisionModel (Sam2VisionModel model)
    • SamConfig configuration class: SamModel (SAM model)
    • SamHQConfig configuration class: SamHQModel (SAM-HQ model)
    • SamHQVisionConfig configuration class: SamHQVisionModel (SamHQVisionModel model)
    • SamVisionConfig configuration class: SamVisionModel (SamVisionModel model)
    • SeamlessM4TConfig configuration class: SeamlessM4TModel (SeamlessM4T model)
    • SeamlessM4Tv2Config configuration class: SeamlessM4Tv2Model (SeamlessM4Tv2 model)
    • SeedOssConfig configuration class: SeedOssModel (SeedOss model)
    • SegGptConfig configuration class: SegGptModel (SegGPT model)
    • SegformerConfig configuration class: SegformerModel (SegFormer model)
    • Siglip2Config configuration class: Siglip2Model (SigLIP2 model)
    • SiglipConfig configuration class: SiglipModel (SigLIP model)
    • SiglipVisionConfig configuration class: SiglipVisionModel (SiglipVisionModel model)
    • SmolLM3Config configuration class: SmolLM3Model (SmolLM3 model)
    • SmolVLMConfig configuration class: SmolVLMModel (SmolVLM model)
    • SmolVLMVisionConfig configuration class: SmolVLMVisionTransformer (SmolVLMVisionTransformer model)
    • Speech2TextConfig configuration class: Speech2TextModel (Speech2Text model)
    • SpeechT5Config configuration class: SpeechT5Model (SpeechT5 model)
    • SplinterConfig configuration class: SplinterModel (Splinter model)
    • SqueezeBertConfig configuration class: SqueezeBertModel (SqueezeBERT model)
    • StableLmConfig configuration class: StableLmModel (StableLm model)
    • Starcoder2Config configuration class: Starcoder2Model (Starcoder2 model)
    • SwiftFormerConfig configuration class: SwiftFormerModel (SwiftFormer model)
    • Swin2SRConfig configuration class: Swin2SRModel (Swin2SR model)
    • SwinConfig configuration class: SwinModel (Swin Transformer model)
    • Swinv2Config configuration class: Swinv2Model (Swin Transformer V2 model)
    • SwitchTransformersConfig configuration class: SwitchTransformersModel (SwitchTransformers model)
    • T5Config configuration class: T5Model (T5 model)
    • T5GemmaConfig configuration class: T5GemmaModel (T5Gemma model)
    • TableTransformerConfig configuration class: TableTransformerModel (Table Transformer model)
    • TapasConfig configuration class: TapasModel (TAPAS model)
    • TextNetConfig configuration class: TextNetModel (TextNet model)
    • TimeSeriesTransformerConfig configuration class: TimeSeriesTransformerModel (Time Series Transformer model)
    • TimesFmConfig configuration class: TimesFmModel (TimesFm model)
    • TimesformerConfig configuration class: TimesformerModel (TimeSformer model)
    • TimmBackboneConfig configuration class: TimmBackbone (TimmBackbone model)
    • TimmWrapperConfig configuration class: TimmWrapperModel (TimmWrapperModel model)
    • TrajectoryTransformerConfig configuration class: TrajectoryTransformerModel (Trajectory Transformer model)
    • TransfoXLConfig configuration class: TransfoXLModel (Transformer-XL model)
    • TvltConfig configuration class: TvltModel (TVLT model)
    • TvpConfig configuration class: TvpModel (TVP model)
    • UMT5Config configuration class: UMT5Model (UMT5 model)
    • UdopConfig configuration class: UdopModel (UDOP model)
    • UniSpeechConfig configuration class: UniSpeechModel (UniSpeech model)
    • UniSpeechSatConfig configuration class: UniSpeechSatModel (UniSpeechSat model)
    • UnivNetConfig configuration class: UnivNetModel (UnivNet model)
    • VJEPA2Config configuration class: VJEPA2Model (VJEPA2Model model)
    • VanConfig configuration class: VanModel (VAN model)
    • ViTConfig configuration class: ViTModel (ViT model)
    • ViTHybridConfig configuration class: ViTHybridModel (ViT Hybrid model)
    • ViTMAEConfig configuration class: ViTMAEModel (ViTMAE model)
    • ViTMSNConfig configuration class: ViTMSNModel (ViTMSN model)
    • VideoLlavaConfig configuration class: VideoLlavaModel (VideoLlava model)
    • VideoMAEConfig configuration class: VideoMAEModel (VideoMAE model)
    • ViltConfig configuration class: ViltModel (ViLT model)
    • VipLlavaConfig configuration class: VipLlavaModel (VipLlava model)
    • VisionTextDualEncoderConfig configuration class: VisionTextDualEncoderModel (VisionTextDualEncoder model)
    • VisualBertConfig configuration class: VisualBertModel (VisualBERT model)
    • VitDetConfig configuration class: VitDetModel (VitDet model)
    • VitsConfig configuration class: VitsModel (VITS model)
    • VivitConfig configuration class: VivitModel (ViViT model)
    • VoxtralConfig configuration class: VoxtralForConditionalGeneration (Voxtral model)
    • VoxtralEncoderConfig configuration class: VoxtralEncoder (Voxtral Encoder model)
    • Wav2Vec2BertConfig configuration class: Wav2Vec2BertModel (Wav2Vec2-BERT model)
    • Wav2Vec2Config configuration class: Wav2Vec2Model (Wav2Vec2 model)
    • Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerModel (Wav2Vec2-Conformer model)
    • WavLMConfig configuration class: WavLMModel (WavLM model)
    • WhisperConfig configuration class: WhisperModel (Whisper model)
    • XCLIPConfig configuration class: XCLIPModel (X-CLIP model)
    • XGLMConfig configuration class: XGLMModel (XGLM model)
    • XLMConfig configuration class: XLMModel (XLM model)
    • XLMProphetNetConfig configuration class: XLMProphetNetModel (XLM-ProphetNet model)
    • XLMRobertaConfig configuration class: XLMRobertaModel (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLModel (XLM-RoBERTa-XL model)
    • XLNetConfig configuration class: XLNetModel (XLNet model)
    • XcodecConfig configuration class: XcodecModel (X-CODEC model)
    • XmodConfig configuration class: XmodModel (X-MOD model)
    • YolosConfig configuration class: YolosModel (YOLOS model)
    • YosoConfig configuration class: YosoModel (YOSO model)
    • Zamba2Config configuration class: Zamba2Model (Zamba2 model)
    • ZambaConfig configuration class: ZambaModel (Zamba model)
    • xLSTMConfig configuration class: xLSTMModel (xLSTM model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the base model classes of the library from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModel

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModel.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the base model classes of the library from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • aimv2Aimv2Model (AIMv2 model)
  • aimv2_vision_modelAimv2VisionModel (Aimv2VisionModel model)
  • albertAlbertModel (ALBERT model)
  • alignAlignModel (ALIGN model)
  • altclipAltCLIPModel (AltCLIP model)
  • apertusApertusModel (Apertus model)
  • arceeArceeModel (Arcee model)
  • ariaAriaModel (Aria model)
  • aria_textAriaTextModel (AriaText model)
  • audio-spectrogram-transformerASTModel (Audio Spectrogram Transformer model)
  • autoformerAutoformerModel (Autoformer model)
  • aya_visionAyaVisionModel (AyaVision model)
  • bambaBambaModel (Bamba model)
  • barkBarkModel (Bark model)
  • bartBartModel (BART model)
  • beitBeitModel (BEiT model)
  • bertBertModel (BERT model)
  • bert-generationBertGenerationEncoder (Bert Generation model)
  • big_birdBigBirdModel (BigBird model)
  • bigbird_pegasusBigBirdPegasusModel (BigBird-Pegasus model)
  • biogptBioGptModel (BioGpt model)
  • bitBitModel (BiT model)
  • bitnetBitNetModel (BitNet model)
  • blenderbotBlenderbotModel (Blenderbot model)
  • blenderbot-smallBlenderbotSmallModel (BlenderbotSmall model)
  • blipBlipModel (BLIP model)
  • blip-2Blip2Model (BLIP-2 model)
  • blip_2_qformerBlip2QFormerModel (BLIP-2 QFormer model)
  • bloomBloomModel (BLOOM model)
  • bridgetowerBridgeTowerModel (BridgeTower model)
  • brosBrosModel (BROS model)
  • camembertCamembertModel (CamemBERT model)
  • canineCanineModel (CANINE model)
  • chameleonChameleonModel (Chameleon model)
  • chinese_clipChineseCLIPModel (Chinese-CLIP model)
  • chinese_clip_vision_modelChineseCLIPVisionModel (ChineseCLIPVisionModel model)
  • clapClapModel (CLAP model)
  • clipCLIPModel (CLIP model)
  • clip_text_modelCLIPTextModel (CLIPTextModel model)
  • clip_vision_modelCLIPVisionModel (CLIPVisionModel model)
  • clipsegCLIPSegModel (CLIPSeg model)
  • clvpClvpModelForConditionalGeneration (CLVP model)
  • code_llamaLlamaModel (CodeLlama model)
  • codegenCodeGenModel (CodeGen model)
  • cohereCohereModel (Cohere model)
  • cohere2Cohere2Model (Cohere2 model)
  • cohere2_visionCohere2VisionModel (Cohere2Vision model)
  • conditional_detrConditionalDetrModel (Conditional DETR model)
  • convbertConvBertModel (ConvBERT model)
  • convnextConvNextModel (ConvNeXT model)
  • convnextv2ConvNextV2Model (ConvNeXTV2 model)
  • cpmantCpmAntModel (CPM-Ant model)
  • csmCsmForConditionalGeneration (CSM model)
  • ctrlCTRLModel (CTRL model)
  • cvtCvtModel (CvT model)
  • d_fineDFineModel (D-FINE model)
  • dab-detrDabDetrModel (DAB-DETR model)
  • dacDacModel (DAC model)
  • data2vec-audioData2VecAudioModel (Data2VecAudio model)
  • data2vec-textData2VecTextModel (Data2VecText model)
  • data2vec-visionData2VecVisionModel (Data2VecVision model)
  • dbrxDbrxModel (DBRX model)
  • debertaDebertaModel (DeBERTa model)
  • deberta-v2DebertaV2Model (DeBERTa-v2 model)
  • decision_transformerDecisionTransformerModel (Decision Transformer model)
  • deepseek_v2DeepseekV2Model (DeepSeek-V2 model)
  • deepseek_v3DeepseekV3Model (DeepSeek-V3 model)
  • deepseek_vlDeepseekVLModel (DeepseekVL model)
  • deepseek_vl_hybridDeepseekVLHybridModel (DeepseekVLHybrid model)
  • deformable_detrDeformableDetrModel (Deformable DETR model)
  • deitDeiTModel (DeiT model)
  • depth_proDepthProModel (DepthPro model)
  • detaDetaModel (DETA model)
  • detrDetrModel (DETR model)
  • diaDiaModel (Dia model)
  • diffllamaDiffLlamaModel (DiffLlama model)
  • dinatDinatModel (DiNAT model)
  • dinov2Dinov2Model (DINOv2 model)
  • dinov2_with_registersDinov2WithRegistersModel (DINOv2 with Registers model)
  • dinov3_convnextDINOv3ConvNextModel (DINOv3 ConvNext model)
  • dinov3_vitDINOv3ViTModel (DINOv3 ViT model)
  • distilbertDistilBertModel (DistilBERT model)
  • dogeDogeModel (Doge model)
  • donut-swinDonutSwinModel (DonutSwin model)
  • dots1Dots1Model (dots1 model)
  • dprDPRQuestionEncoder (DPR model)
  • dptDPTModel (DPT model)
  • efficientformerEfficientFormerModel (EfficientFormer model)
  • efficientloftrEfficientLoFTRModel (EfficientLoFTR model)
  • efficientnetEfficientNetModel (EfficientNet model)
  • electraElectraModel (ELECTRA model)
  • emu3Emu3Model (Emu3 model)
  • encodecEncodecModel (EnCodec model)
  • ernieErnieModel (ERNIE model)
  • ernie4_5Ernie4_5Model (Ernie4_5 model)
  • ernie4_5_moeErnie4_5_MoeModel (Ernie4_5_MoE model)
  • ernie_mErnieMModel (ErnieM model)
  • esmEsmModel (ESM model)
  • evollaEvollaModel (Evolla model)
  • exaone4Exaone4Model (EXAONE-4.0 model)
  • falconFalconModel (Falcon model)
  • falcon_h1FalconH1Model (FalconH1 model)
  • falcon_mambaFalconMambaModel (FalconMamba model)
  • fastspeech2_conformerFastSpeech2ConformerModel (FastSpeech2Conformer model)
  • fastspeech2_conformer_with_hifiganFastSpeech2ConformerWithHifiGan (FastSpeech2ConformerWithHifiGan model)
  • flaubertFlaubertModel (FlauBERT model)
  • flavaFlavaModel (FLAVA model)
  • florence2Florence2Model (Florence2 model)
  • fnetFNetModel (FNet model)
  • focalnetFocalNetModel (FocalNet model)
  • fsmtFSMTModel (FairSeq Machine-Translation model)
  • funnelFunnelModel or FunnelBaseModel (Funnel Transformer model)
  • fuyuFuyuModel (Fuyu model)
  • gemmaGemmaModel (Gemma model)
  • gemma2Gemma2Model (Gemma2 model)
  • gemma3Gemma3Model (Gemma3ForConditionalGeneration model)
  • gemma3_textGemma3TextModel (Gemma3ForCausalLM model)
  • gemma3nGemma3nModel (Gemma3nForConditionalGeneration model)
  • gemma3n_audioGemma3nAudioEncoder (Gemma3nAudioEncoder model)
  • gemma3n_textGemma3nTextModel (Gemma3nForCausalLM model)
  • gemma3n_visionTimmWrapperModel (TimmWrapperModel model)
  • gitGitModel (GIT model)
  • glmGlmModel (GLM model)
  • glm4Glm4Model (GLM4 model)
  • glm4_moeGlm4MoeModel (Glm4MoE model)
  • glm4vGlm4vModel (GLM4V model)
  • glm4v_moeGlm4vMoeModel (GLM4VMOE model)
  • glm4v_moe_textGlm4vMoeTextModel (GLM4VMOE model)
  • glm4v_textGlm4vTextModel (GLM4V model)
  • glpnGLPNModel (GLPN model)
  • got_ocr2GotOcr2Model (GOT-OCR2 model)
  • gpt-sw3GPT2Model (GPT-Sw3 model)
  • gpt2GPT2Model (OpenAI GPT-2 model)
  • gpt_bigcodeGPTBigCodeModel (GPTBigCode model)
  • gpt_neoGPTNeoModel (GPT Neo model)
  • gpt_neoxGPTNeoXModel (GPT NeoX model)
  • gpt_neox_japaneseGPTNeoXJapaneseModel (GPT NeoX Japanese model)
  • gpt_ossGptOssModel (GptOss model)
  • gptjGPTJModel (GPT-J model)
  • gptsan-japaneseGPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
  • graniteGraniteModel (Granite model)
  • granitemoeGraniteMoeModel (GraniteMoeMoe model)
  • granitemoehybridGraniteMoeHybridModel (GraniteMoeHybrid model)
  • granitemoesharedGraniteMoeSharedModel (GraniteMoeSharedMoe model)
  • graphormerGraphormerModel (Graphormer model)
  • grounding-dinoGroundingDinoModel (Grounding DINO model)
  • groupvitGroupViTModel (GroupViT model)
  • heliumHeliumModel (Helium model)
  • hgnet_v2HGNetV2Backbone (HGNet-V2 model)
  • hieraHieraModel (Hiera model)
  • hubertHubertModel (Hubert model)
  • hunyuan_v1_denseHunYuanDenseV1Model (HunYuanDenseV1 model)
  • hunyuan_v1_moeHunYuanMoEV1Model (HunYuanMoeV1 model)
  • ibertIBertModel (I-BERT model)
  • ideficsIdeficsModel (IDEFICS model)
  • idefics2Idefics2Model (Idefics2 model)
  • idefics3Idefics3Model (Idefics3 model)
  • idefics3_visionIdefics3VisionTransformer (Idefics3VisionTransformer model)
  • ijepaIJepaModel (I-JEPA model)
  • imagegptImageGPTModel (ImageGPT model)
  • informerInformerModel (Informer model)
  • instructblipInstructBlipModel (InstructBLIP model)
  • instructblipvideoInstructBlipVideoModel (InstructBlipVideo model)
  • internvlInternVLModel (InternVL model)
  • internvl_visionInternVLVisionModel (InternVLVision model)
  • jambaJambaModel (Jamba model)
  • janusJanusModel (Janus model)
  • jetmoeJetMoeModel (JetMoe model)
  • jukeboxJukeboxModel (Jukebox model)
  • kosmos-2Kosmos2Model (KOSMOS-2 model)
  • kosmos-2.5Kosmos2_5Model (KOSMOS-2.5 model)
  • kyutai_speech_to_textKyutaiSpeechToTextModel (KyutaiSpeechToText model)
  • layoutlmLayoutLMModel (LayoutLM model)
  • layoutlmv2LayoutLMv2Model (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3Model (LayoutLMv3 model)
  • ledLEDModel (LED model)
  • levitLevitModel (LeViT model)
  • lfm2Lfm2Model (Lfm2 model)
  • lightglueLightGlueForKeypointMatching (LightGlue model)
  • liltLiltModel (LiLT model)
  • llamaLlamaModel (LLaMA model)
  • llama4Llama4ForConditionalGeneration (Llama4 model)
  • llama4_textLlama4TextModel (Llama4ForCausalLM model)
  • llavaLlavaModel (LLaVa model)
  • llava_nextLlavaNextModel (LLaVA-NeXT model)
  • llava_next_videoLlavaNextVideoModel (LLaVa-NeXT-Video model)
  • llava_onevisionLlavaOnevisionModel (LLaVA-Onevision model)
  • longformerLongformerModel (Longformer model)
  • longt5LongT5Model (LongT5 model)
  • lukeLukeModel (LUKE model)
  • lxmertLxmertModel (LXMERT model)
  • m2m_100M2M100Model (M2M100 model)
  • mambaMambaModel (Mamba model)
  • mamba2Mamba2Model (mamba2 model)
  • marianMarianModel (Marian model)
  • markuplmMarkupLMModel (MarkupLM model)
  • mask2formerMask2FormerModel (Mask2Former model)
  • maskformerMaskFormerModel (MaskFormer model)
  • maskformer-swinMaskFormerSwinModel (MaskFormerSwin model)
  • mbartMBartModel (mBART model)
  • mctctMCTCTModel (M-CTC-T model)
  • megaMegaModel (MEGA model)
  • megatron-bertMegatronBertModel (Megatron-BERT model)
  • metaclip_2MetaClip2Model (MetaCLIP 2 model)
  • mgp-strMgpstrForSceneTextRecognition (MGP-STR model)
  • mimiMimiModel (Mimi model)
  • minimaxMiniMaxModel (MiniMax model)
  • mistralMistralModel (Mistral model)
  • mistral3Mistral3Model (Mistral3 model)
  • mixtralMixtralModel (Mixtral model)
  • mlcdMLCDVisionModel (MLCD model)
  • mllamaMllamaModel (Mllama model)
  • mm-grounding-dinoMMGroundingDinoModel (MM Grounding DINO model)
  • mobilebertMobileBertModel (MobileBERT model)
  • mobilenet_v1MobileNetV1Model (MobileNetV1 model)
  • mobilenet_v2MobileNetV2Model (MobileNetV2 model)
  • mobilevitMobileViTModel (MobileViT model)
  • mobilevitv2MobileViTV2Model (MobileViTV2 model)
  • modernbertModernBertModel (ModernBERT model)
  • modernbert-decoderModernBertDecoderModel (ModernBertDecoder model)
  • moonshineMoonshineModel (Moonshine model)
  • moshiMoshiModel (Moshi model)
  • mpnetMPNetModel (MPNet model)
  • mptMptModel (MPT model)
  • mraMraModel (MRA model)
  • mt5MT5Model (MT5 model)
  • musicgenMusicgenModel (MusicGen model)
  • musicgen_melodyMusicgenMelodyModel (MusicGen Melody model)
  • mvpMvpModel (MVP model)
  • natNatModel (NAT model)
  • nemotronNemotronModel (Nemotron model)
  • nezhaNezhaModel (Nezha model)
  • nllb-moeNllbMoeModel (NLLB-MOE model)
  • nystromformerNystromformerModel (Nyströmformer model)
  • olmoOlmoModel (OLMo model)
  • olmo2Olmo2Model (OLMo2 model)
  • olmoeOlmoeModel (OLMoE model)
  • omdet-turboOmDetTurboForObjectDetection (OmDet-Turbo model)
  • oneformerOneFormerModel (OneFormer model)
  • open-llamaOpenLlamaModel (OpenLlama model)
  • openai-gptOpenAIGPTModel (OpenAI GPT model)
  • optOPTModel (OPT model)
  • ovis2Ovis2Model (Ovis2 model)
  • owlv2Owlv2Model (OWLv2 model)
  • owlvitOwlViTModel (OWL-ViT model)
  • paligemmaPaliGemmaModel (PaliGemma model)
  • patchtsmixerPatchTSMixerModel (PatchTSMixer model)
  • patchtstPatchTSTModel (PatchTST model)
  • pegasusPegasusModel (Pegasus model)
  • pegasus_xPegasusXModel (PEGASUS-X model)
  • perceiverPerceiverModel (Perceiver model)
  • perception_encoderPerceptionEncoder (PerceptionEncoder model)
  • perception_lmPerceptionLMModel (PerceptionLM model)
  • persimmonPersimmonModel (Persimmon model)
  • phiPhiModel (Phi model)
  • phi3Phi3Model (Phi3 model)
  • phi4_multimodalPhi4MultimodalModel (Phi4Multimodal model)
  • phimoePhimoeModel (Phimoe model)
  • pixtralPixtralVisionModel (Pixtral model)
  • plbartPLBartModel (PLBart model)
  • poolformerPoolFormerModel (PoolFormer model)
  • prophetnetProphetNetModel (ProphetNet model)
  • pvtPvtModel (PVT model)
  • pvt_v2PvtV2Model (PVTv2 model)
  • qdqbertQDQBertModel (QDQBert model)
  • qwen2Qwen2Model (Qwen2 model)
  • qwen2_5_vlQwen2_5_VLModel (Qwen2_5_VL model)
  • qwen2_5_vl_textQwen2_5_VLTextModel (Qwen2_5_VL model)
  • qwen2_audio_encoderQwen2AudioEncoder (Qwen2AudioEncoder model)
  • qwen2_moeQwen2MoeModel (Qwen2MoE model)
  • qwen2_vlQwen2VLModel (Qwen2VL model)
  • qwen2_vl_textQwen2VLTextModel (Qwen2VL model)
  • qwen3Qwen3Model (Qwen3 model)
  • qwen3_moeQwen3MoeModel (Qwen3MoE model)
  • recurrent_gemmaRecurrentGemmaModel (RecurrentGemma model)
  • reformerReformerModel (Reformer model)
  • regnetRegNetModel (RegNet model)
  • rembertRemBertModel (RemBERT model)
  • resnetResNetModel (ResNet model)
  • retribertRetriBertModel (RetriBERT model)
  • robertaRobertaModel (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertModel (RoCBert model)
  • roformerRoFormerModel (RoFormer model)
  • rt_detrRTDetrModel (RT-DETR model)
  • rt_detr_v2RTDetrV2Model (RT-DETRv2 model)
  • rwkvRwkvModel (RWKV model)
  • samSamModel (SAM model)
  • sam2Sam2Model (SAM2 model)
  • sam2_hiera_det_modelSam2HieraDetModel (Sam2HieraDetModel model)
  • sam2_videoSam2VideoModel (Sam2VideoModel model)
  • sam2_vision_modelSam2VisionModel (Sam2VisionModel model)
  • sam_hqSamHQModel (SAM-HQ model)
  • sam_hq_vision_modelSamHQVisionModel (SamHQVisionModel model)
  • sam_vision_modelSamVisionModel (SamVisionModel model)
  • seamless_m4tSeamlessM4TModel (SeamlessM4T model)
  • seamless_m4t_v2SeamlessM4Tv2Model (SeamlessM4Tv2 model)
  • seed_ossSeedOssModel (SeedOss model)
  • segformerSegformerModel (SegFormer model)
  • seggptSegGptModel (SegGPT model)
  • sewSEWModel (SEW model)
  • sew-dSEWDModel (SEW-D model)
  • siglipSiglipModel (SigLIP model)
  • siglip2Siglip2Model (SigLIP2 model)
  • siglip_vision_modelSiglipVisionModel (SiglipVisionModel model)
  • smollm3SmolLM3Model (SmolLM3 model)
  • smolvlmSmolVLMModel (SmolVLM model)
  • smolvlm_visionSmolVLMVisionTransformer (SmolVLMVisionTransformer model)
  • speech_to_textSpeech2TextModel (Speech2Text model)
  • speecht5SpeechT5Model (SpeechT5 model)
  • splinterSplinterModel (Splinter model)
  • squeezebertSqueezeBertModel (SqueezeBERT model)
  • stablelmStableLmModel (StableLm model)
  • starcoder2Starcoder2Model (Starcoder2 model)
  • swiftformerSwiftFormerModel (SwiftFormer model)
  • swinSwinModel (Swin Transformer model)
  • swin2srSwin2SRModel (Swin2SR model)
  • swinv2Swinv2Model (Swin Transformer V2 model)
  • switch_transformersSwitchTransformersModel (SwitchTransformers model)
  • t5T5Model (T5 model)
  • t5gemmaT5GemmaModel (T5Gemma model)
  • table-transformerTableTransformerModel (Table Transformer model)
  • tapasTapasModel (TAPAS model)
  • textnetTextNetModel (TextNet model)
  • time_series_transformerTimeSeriesTransformerModel (Time Series Transformer model)
  • timesfmTimesFmModel (TimesFm model)
  • timesformerTimesformerModel (TimeSformer model)
  • timm_backboneTimmBackbone (TimmBackbone model)
  • timm_wrapperTimmWrapperModel (TimmWrapperModel model)
  • trajectory_transformerTrajectoryTransformerModel (Trajectory Transformer model)
  • transfo-xlTransfoXLModel (Transformer-XL model)
  • tvltTvltModel (TVLT model)
  • tvpTvpModel (TVP model)
  • udopUdopModel (UDOP model)
  • umt5UMT5Model (UMT5 model)
  • unispeechUniSpeechModel (UniSpeech model)
  • unispeech-satUniSpeechSatModel (UniSpeechSat model)
  • univnetUnivNetModel (UnivNet model)
  • vanVanModel (VAN model)
  • video_llavaVideoLlavaModel (VideoLlava model)
  • videomaeVideoMAEModel (VideoMAE model)
  • viltViltModel (ViLT model)
  • vipllavaVipLlavaModel (VipLlava model)
  • vision-text-dual-encoderVisionTextDualEncoderModel (VisionTextDualEncoder model)
  • visual_bertVisualBertModel (VisualBERT model)
  • vitViTModel (ViT model)
  • vit_hybridViTHybridModel (ViT Hybrid model)
  • vit_maeViTMAEModel (ViTMAE model)
  • vit_msnViTMSNModel (ViTMSN model)
  • vitdetVitDetModel (VitDet model)
  • vitsVitsModel (VITS model)
  • vivitVivitModel (ViViT model)
  • vjepa2VJEPA2Model (VJEPA2Model model)
  • voxtralVoxtralForConditionalGeneration (Voxtral model)
  • voxtral_encoderVoxtralEncoder (Voxtral Encoder model)
  • wav2vec2Wav2Vec2Model (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2BertModel (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2ConformerModel (Wav2Vec2-Conformer model)
  • wavlmWavLMModel (WavLM model)
  • whisperWhisperModel (Whisper model)
  • xclipXCLIPModel (X-CLIP model)
  • xcodecXcodecModel (X-CODEC model)
  • xglmXGLMModel (XGLM model)
  • xlmXLMModel (XLM model)
  • xlm-prophetnetXLMProphetNetModel (XLM-ProphetNet model)
  • xlm-robertaXLMRobertaModel (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLModel (XLM-RoBERTa-XL model)
  • xlnetXLNetModel (XLNet model)
  • xlstmxLSTMModel (xLSTM model)
  • xmodXmodModel (X-MOD model)
  • yolosYolosModel (YOLOS model)
  • yosoYosoModel (YOSO model)
  • zambaZambaModel (Zamba model)
  • zamba2Zamba2Model (Zamba2 model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModel.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModel.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModel

class transformers.TFAutoModel

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: TFAlbertModel (ALBERT model)
    • BartConfig configuration class: TFBartModel (BART model)
    • BertConfig configuration class: TFBertModel (BERT model)
    • BlenderbotConfig configuration class: TFBlenderbotModel (Blenderbot model)
    • BlenderbotSmallConfig configuration class: TFBlenderbotSmallModel (BlenderbotSmall model)
    • BlipConfig configuration class: TFBlipModel (BLIP model)
    • CLIPConfig configuration class: TFCLIPModel (CLIP model)
    • CTRLConfig configuration class: TFCTRLModel (CTRL model)
    • CamembertConfig configuration class: TFCamembertModel (CamemBERT model)
    • ConvBertConfig configuration class: TFConvBertModel (ConvBERT model)
    • ConvNextConfig configuration class: TFConvNextModel (ConvNeXT model)
    • ConvNextV2Config configuration class: TFConvNextV2Model (ConvNeXTV2 model)
    • CvtConfig configuration class: TFCvtModel (CvT model)
    • DPRConfig configuration class: TFDPRQuestionEncoder (DPR model)
    • Data2VecVisionConfig configuration class: TFData2VecVisionModel (Data2VecVision model)
    • DebertaConfig configuration class: TFDebertaModel (DeBERTa model)
    • DebertaV2Config configuration class: TFDebertaV2Model (DeBERTa-v2 model)
    • DeiTConfig configuration class: TFDeiTModel (DeiT model)
    • DistilBertConfig configuration class: TFDistilBertModel (DistilBERT model)
    • EfficientFormerConfig configuration class: TFEfficientFormerModel (EfficientFormer model)
    • ElectraConfig configuration class: TFElectraModel (ELECTRA model)
    • EsmConfig configuration class: TFEsmModel (ESM model)
    • FlaubertConfig configuration class: TFFlaubertModel (FlauBERT model)
    • FunnelConfig configuration class: TFFunnelModel or TFFunnelBaseModel (Funnel Transformer model)
    • GPT2Config configuration class: TFGPT2Model (OpenAI GPT-2 model)
    • GPTJConfig configuration class: TFGPTJModel (GPT-J model)
    • GroupViTConfig configuration class: TFGroupViTModel (GroupViT model)
    • HubertConfig configuration class: TFHubertModel (Hubert model)
    • IdeficsConfig configuration class: TFIdeficsModel (IDEFICS model)
    • LEDConfig configuration class: TFLEDModel (LED model)
    • LayoutLMConfig configuration class: TFLayoutLMModel (LayoutLM model)
    • LayoutLMv3Config configuration class: TFLayoutLMv3Model (LayoutLMv3 model)
    • LongformerConfig configuration class: TFLongformerModel (Longformer model)
    • LxmertConfig configuration class: TFLxmertModel (LXMERT model)
    • MBartConfig configuration class: TFMBartModel (mBART model)
    • MPNetConfig configuration class: TFMPNetModel (MPNet model)
    • MT5Config configuration class: TFMT5Model (MT5 model)
    • MarianConfig configuration class: TFMarianModel (Marian model)
    • MistralConfig configuration class: TFMistralModel (Mistral model)
    • MobileBertConfig configuration class: TFMobileBertModel (MobileBERT model)
    • MobileViTConfig configuration class: TFMobileViTModel (MobileViT model)
    • OPTConfig configuration class: TFOPTModel (OPT model)
    • OpenAIGPTConfig configuration class: TFOpenAIGPTModel (OpenAI GPT model)
    • PegasusConfig configuration class: TFPegasusModel (Pegasus model)
    • RegNetConfig configuration class: TFRegNetModel (RegNet model)
    • RemBertConfig configuration class: TFRemBertModel (RemBERT model)
    • ResNetConfig configuration class: TFResNetModel (ResNet model)
    • RoFormerConfig configuration class: TFRoFormerModel (RoFormer model)
    • RobertaConfig configuration class: TFRobertaModel (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
    • SamConfig configuration class: TFSamModel (SAM model)
    • SamVisionConfig configuration class: TFSamVisionModel (SamVisionModel model)
    • SegformerConfig configuration class: TFSegformerModel (SegFormer model)
    • Speech2TextConfig configuration class: TFSpeech2TextModel (Speech2Text model)
    • SwiftFormerConfig configuration class: TFSwiftFormerModel (SwiftFormer model)
    • SwinConfig configuration class: TFSwinModel (Swin Transformer model)
    • T5Config configuration class: TFT5Model (T5 model)
    • TapasConfig configuration class: TFTapasModel (TAPAS model)
    • TransfoXLConfig configuration class: TFTransfoXLModel (Transformer-XL model)
    • ViTConfig configuration class: TFViTModel (ViT model)
    • ViTMAEConfig configuration class: TFViTMAEModel (ViTMAE model)
    • VisionTextDualEncoderConfig configuration class: TFVisionTextDualEncoderModel (VisionTextDualEncoder model)
    • Wav2Vec2Config configuration class: TFWav2Vec2Model (Wav2Vec2 model)
    • WhisperConfig configuration class: TFWhisperModel (Whisper model)
    • XGLMConfig configuration class: TFXGLMModel (XGLM model)
    • XLMConfig configuration class: TFXLMModel (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaModel (XLM-RoBERTa model)
    • XLNetConfig configuration class: TFXLNetModel (XLNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the base model classes of the library from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModel

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModel.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the base model classes of the library from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertTFAlbertModel (ALBERT model)
  • bartTFBartModel (BART model)
  • bertTFBertModel (BERT model)
  • blenderbotTFBlenderbotModel (Blenderbot model)
  • blenderbot-smallTFBlenderbotSmallModel (BlenderbotSmall model)
  • blipTFBlipModel (BLIP model)
  • camembertTFCamembertModel (CamemBERT model)
  • clipTFCLIPModel (CLIP model)
  • convbertTFConvBertModel (ConvBERT model)
  • convnextTFConvNextModel (ConvNeXT model)
  • convnextv2TFConvNextV2Model (ConvNeXTV2 model)
  • ctrlTFCTRLModel (CTRL model)
  • cvtTFCvtModel (CvT model)
  • data2vec-visionTFData2VecVisionModel (Data2VecVision model)
  • debertaTFDebertaModel (DeBERTa model)
  • deberta-v2TFDebertaV2Model (DeBERTa-v2 model)
  • deitTFDeiTModel (DeiT model)
  • distilbertTFDistilBertModel (DistilBERT model)
  • dprTFDPRQuestionEncoder (DPR model)
  • efficientformerTFEfficientFormerModel (EfficientFormer model)
  • electraTFElectraModel (ELECTRA model)
  • esmTFEsmModel (ESM model)
  • flaubertTFFlaubertModel (FlauBERT model)
  • funnelTFFunnelModel or TFFunnelBaseModel (Funnel Transformer model)
  • gpt-sw3TFGPT2Model (GPT-Sw3 model)
  • gpt2TFGPT2Model (OpenAI GPT-2 model)
  • gptjTFGPTJModel (GPT-J model)
  • groupvitTFGroupViTModel (GroupViT model)
  • hubertTFHubertModel (Hubert model)
  • ideficsTFIdeficsModel (IDEFICS model)
  • layoutlmTFLayoutLMModel (LayoutLM model)
  • layoutlmv3TFLayoutLMv3Model (LayoutLMv3 model)
  • ledTFLEDModel (LED model)
  • longformerTFLongformerModel (Longformer model)
  • lxmertTFLxmertModel (LXMERT model)
  • marianTFMarianModel (Marian model)
  • mbartTFMBartModel (mBART model)
  • mistralTFMistralModel (Mistral model)
  • mobilebertTFMobileBertModel (MobileBERT model)
  • mobilevitTFMobileViTModel (MobileViT model)
  • mpnetTFMPNetModel (MPNet model)
  • mt5TFMT5Model (MT5 model)
  • openai-gptTFOpenAIGPTModel (OpenAI GPT model)
  • optTFOPTModel (OPT model)
  • pegasusTFPegasusModel (Pegasus model)
  • regnetTFRegNetModel (RegNet model)
  • rembertTFRemBertModel (RemBERT model)
  • resnetTFResNetModel (ResNet model)
  • robertaTFRobertaModel (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
  • roformerTFRoFormerModel (RoFormer model)
  • samTFSamModel (SAM model)
  • sam_vision_modelTFSamVisionModel (SamVisionModel model)
  • segformerTFSegformerModel (SegFormer model)
  • speech_to_textTFSpeech2TextModel (Speech2Text model)
  • swiftformerTFSwiftFormerModel (SwiftFormer model)
  • swinTFSwinModel (Swin Transformer model)
  • t5TFT5Model (T5 model)
  • tapasTFTapasModel (TAPAS model)
  • transfo-xlTFTransfoXLModel (Transformer-XL model)
  • vision-text-dual-encoderTFVisionTextDualEncoderModel (VisionTextDualEncoder model)
  • vitTFViTModel (ViT model)
  • vit_maeTFViTMAEModel (ViTMAE model)
  • wav2vec2TFWav2Vec2Model (Wav2Vec2 model)
  • whisperTFWhisperModel (Whisper model)
  • xglmTFXGLMModel (XGLM model)
  • xlmTFXLMModel (XLM model)
  • xlm-robertaTFXLMRobertaModel (XLM-RoBERTa model)
  • xlnetTFXLNetModel (XLNet model)

Examples:

>>> from transformers import AutoConfig, TFAutoModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModel.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModel

class transformers.FlaxAutoModel

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: FlaxAlbertModel (ALBERT model)
    • BartConfig configuration class: FlaxBartModel (BART model)
    • BeitConfig configuration class: FlaxBeitModel (BEiT model)
    • BertConfig configuration class: FlaxBertModel (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdModel (BigBird model)
    • BlenderbotConfig configuration class: FlaxBlenderbotModel (Blenderbot model)
    • BlenderbotSmallConfig configuration class: FlaxBlenderbotSmallModel (BlenderbotSmall model)
    • BloomConfig configuration class: FlaxBloomModel (BLOOM model)
    • CLIPConfig configuration class: FlaxCLIPModel (CLIP model)
    • Dinov2Config configuration class: FlaxDinov2Model (DINOv2 model)
    • DistilBertConfig configuration class: FlaxDistilBertModel (DistilBERT model)
    • ElectraConfig configuration class: FlaxElectraModel (ELECTRA model)
    • GPT2Config configuration class: FlaxGPT2Model (OpenAI GPT-2 model)
    • GPTJConfig configuration class: FlaxGPTJModel (GPT-J model)
    • GPTNeoConfig configuration class: FlaxGPTNeoModel (GPT Neo model)
    • GemmaConfig configuration class: FlaxGemmaModel (Gemma model)
    • LlamaConfig configuration class: FlaxLlamaModel (LLaMA model)
    • LongT5Config configuration class: FlaxLongT5Model (LongT5 model)
    • MBartConfig configuration class: FlaxMBartModel (mBART model)
    • MT5Config configuration class: FlaxMT5Model (MT5 model)
    • MarianConfig configuration class: FlaxMarianModel (Marian model)
    • MistralConfig configuration class: FlaxMistralModel (Mistral model)
    • OPTConfig configuration class: FlaxOPTModel (OPT model)
    • PegasusConfig configuration class: FlaxPegasusModel (Pegasus model)
    • RegNetConfig configuration class: FlaxRegNetModel (RegNet model)
    • ResNetConfig configuration class: FlaxResNetModel (ResNet model)
    • RoFormerConfig configuration class: FlaxRoFormerModel (RoFormer model)
    • RobertaConfig configuration class: FlaxRobertaModel (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
    • T5Config configuration class: FlaxT5Model (T5 model)
    • ViTConfig configuration class: FlaxViTModel (ViT model)
    • VisionTextDualEncoderConfig configuration class: FlaxVisionTextDualEncoderModel (VisionTextDualEncoder model)
    • Wav2Vec2Config configuration class: FlaxWav2Vec2Model (Wav2Vec2 model)
    • WhisperConfig configuration class: FlaxWhisperModel (Whisper model)
    • XGLMConfig configuration class: FlaxXGLMModel (XGLM model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaModel (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the base model classes of the library from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModel

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModel.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the base model classes of the library from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertFlaxAlbertModel (ALBERT model)
  • bartFlaxBartModel (BART model)
  • beitFlaxBeitModel (BEiT model)
  • bertFlaxBertModel (BERT model)
  • big_birdFlaxBigBirdModel (BigBird model)
  • blenderbotFlaxBlenderbotModel (Blenderbot model)
  • blenderbot-smallFlaxBlenderbotSmallModel (BlenderbotSmall model)
  • bloomFlaxBloomModel (BLOOM model)
  • clipFlaxCLIPModel (CLIP model)
  • dinov2FlaxDinov2Model (DINOv2 model)
  • distilbertFlaxDistilBertModel (DistilBERT model)
  • electraFlaxElectraModel (ELECTRA model)
  • gemmaFlaxGemmaModel (Gemma model)
  • gpt-sw3FlaxGPT2Model (GPT-Sw3 model)
  • gpt2FlaxGPT2Model (OpenAI GPT-2 model)
  • gpt_neoFlaxGPTNeoModel (GPT Neo model)
  • gptjFlaxGPTJModel (GPT-J model)
  • llamaFlaxLlamaModel (LLaMA model)
  • longt5FlaxLongT5Model (LongT5 model)
  • marianFlaxMarianModel (Marian model)
  • mbartFlaxMBartModel (mBART model)
  • mistralFlaxMistralModel (Mistral model)
  • mt5FlaxMT5Model (MT5 model)
  • optFlaxOPTModel (OPT model)
  • pegasusFlaxPegasusModel (Pegasus model)
  • regnetFlaxRegNetModel (RegNet model)
  • resnetFlaxResNetModel (ResNet model)
  • robertaFlaxRobertaModel (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
  • roformerFlaxRoFormerModel (RoFormer model)
  • t5FlaxT5Model (T5 model)
  • vision-text-dual-encoderFlaxVisionTextDualEncoderModel (VisionTextDualEncoder model)
  • vitFlaxViTModel (ViT model)
  • wav2vec2FlaxWav2Vec2Model (Wav2Vec2 model)
  • whisperFlaxWhisperModel (Whisper model)
  • xglmFlaxXGLMModel (XGLM model)
  • xlm-robertaFlaxXLMRobertaModel (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModel.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

Generic pretraining classes

以下の自動クラスは、事前学習ヘッドを持つモデルをインスタンス化するために利用可能です。

AutoModelForPreTraining

class transformers.AutoModelForPreTraining

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: AlbertForPreTraining (ALBERT model)
    • BartConfig configuration class: BartForConditionalGeneration (BART model)
    • BertConfig configuration class: BertForPreTraining (BERT model)
    • BigBirdConfig configuration class: BigBirdForPreTraining (BigBird model)
    • BloomConfig configuration class: BloomForCausalLM (BLOOM model)
    • CTRLConfig configuration class: CTRLLMHeadModel (CTRL model)
    • CamembertConfig configuration class: CamembertForMaskedLM (CamemBERT model)
    • ColPaliConfig configuration class: ColPaliForRetrieval (ColPali model)
    • ColQwen2Config configuration class: ColQwen2ForRetrieval (ColQwen2 model)
    • Data2VecTextConfig configuration class: Data2VecTextForMaskedLM (Data2VecText model)
    • DebertaConfig configuration class: DebertaForMaskedLM (DeBERTa model)
    • DebertaV2Config configuration class: DebertaV2ForMaskedLM (DeBERTa-v2 model)
    • DistilBertConfig configuration class: DistilBertForMaskedLM (DistilBERT model)
    • ElectraConfig configuration class: ElectraForPreTraining (ELECTRA model)
    • ErnieConfig configuration class: ErnieForPreTraining (ERNIE model)
    • EvollaConfig configuration class: EvollaForProteinText2Text (Evolla model)
    • Exaone4Config configuration class: Exaone4ForCausalLM (EXAONE-4.0 model)
    • FNetConfig configuration class: FNetForPreTraining (FNet model)
    • FSMTConfig configuration class: FSMTForConditionalGeneration (FairSeq Machine-Translation model)
    • FalconMambaConfig configuration class: FalconMambaForCausalLM (FalconMamba model)
    • FlaubertConfig configuration class: FlaubertWithLMHeadModel (FlauBERT model)
    • FlavaConfig configuration class: FlavaForPreTraining (FLAVA model)
    • Florence2Config configuration class: Florence2ForConditionalGeneration (Florence2 model)
    • FunnelConfig configuration class: FunnelForPreTraining (Funnel Transformer model)
    • GPT2Config configuration class: GPT2LMHeadModel (OpenAI GPT-2 model)
    • GPTBigCodeConfig configuration class: GPTBigCodeForCausalLM (GPTBigCode model)
    • GPTSanJapaneseConfig configuration class: GPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
    • Gemma3Config configuration class: Gemma3ForConditionalGeneration (Gemma3ForConditionalGeneration model)
    • HieraConfig configuration class: HieraForPreTraining (Hiera model)
    • IBertConfig configuration class: IBertForMaskedLM (I-BERT model)
    • Idefics2Config configuration class: Idefics2ForConditionalGeneration (Idefics2 model)
    • Idefics3Config configuration class: Idefics3ForConditionalGeneration (Idefics3 model)
    • IdeficsConfig configuration class: IdeficsForVisionText2Text (IDEFICS model)
    • JanusConfig configuration class: JanusForConditionalGeneration (Janus model)
    • LayoutLMConfig configuration class: LayoutLMForMaskedLM (LayoutLM model)
    • LlavaConfig configuration class: LlavaForConditionalGeneration (LLaVa model)
    • LlavaNextConfig configuration class: LlavaNextForConditionalGeneration (LLaVA-NeXT model)
    • LlavaNextVideoConfig configuration class: LlavaNextVideoForConditionalGeneration (LLaVa-NeXT-Video model)
    • LlavaOnevisionConfig configuration class: LlavaOnevisionForConditionalGeneration (LLaVA-Onevision model)
    • LongformerConfig configuration class: LongformerForMaskedLM (Longformer model)
    • LukeConfig configuration class: LukeForMaskedLM (LUKE model)
    • LxmertConfig configuration class: LxmertForPreTraining (LXMERT model)
    • MPNetConfig configuration class: MPNetForMaskedLM (MPNet model)
    • Mamba2Config configuration class: Mamba2ForCausalLM (mamba2 model)
    • MambaConfig configuration class: MambaForCausalLM (Mamba model)
    • MegaConfig configuration class: MegaForMaskedLM (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertForPreTraining (Megatron-BERT model)
    • Mistral3Config configuration class: Mistral3ForConditionalGeneration (Mistral3 model)
    • MllamaConfig configuration class: MllamaForConditionalGeneration (Mllama model)
    • MobileBertConfig configuration class: MobileBertForPreTraining (MobileBERT model)
    • MptConfig configuration class: MptForCausalLM (MPT model)
    • MraConfig configuration class: MraForMaskedLM (MRA model)
    • MvpConfig configuration class: MvpForConditionalGeneration (MVP model)
    • NezhaConfig configuration class: NezhaForPreTraining (Nezha model)
    • NllbMoeConfig configuration class: NllbMoeForConditionalGeneration (NLLB-MOE model)
    • OpenAIGPTConfig configuration class: OpenAIGPTLMHeadModel (OpenAI GPT model)
    • PaliGemmaConfig configuration class: PaliGemmaForConditionalGeneration (PaliGemma model)
    • Qwen2AudioConfig configuration class: Qwen2AudioForConditionalGeneration (Qwen2Audio model)
    • RetriBertConfig configuration class: RetriBertModel (RetriBERT model)
    • RoCBertConfig configuration class: RoCBertForPreTraining (RoCBert model)
    • RobertaConfig configuration class: RobertaForMaskedLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
    • RwkvConfig configuration class: RwkvForCausalLM (RWKV model)
    • SplinterConfig configuration class: SplinterForPreTraining (Splinter model)
    • SqueezeBertConfig configuration class: SqueezeBertForMaskedLM (SqueezeBERT model)
    • SwitchTransformersConfig configuration class: SwitchTransformersForConditionalGeneration (SwitchTransformers model)
    • T5Config configuration class: T5ForConditionalGeneration (T5 model)
    • T5GemmaConfig configuration class: T5GemmaForConditionalGeneration (T5Gemma model)
    • TapasConfig configuration class: TapasForMaskedLM (TAPAS model)
    • TransfoXLConfig configuration class: TransfoXLLMHeadModel (Transformer-XL model)
    • TvltConfig configuration class: TvltForPreTraining (TVLT model)
    • UniSpeechConfig configuration class: UniSpeechForPreTraining (UniSpeech model)
    • UniSpeechSatConfig configuration class: UniSpeechSatForPreTraining (UniSpeechSat model)
    • ViTMAEConfig configuration class: ViTMAEForPreTraining (ViTMAE model)
    • VideoLlavaConfig configuration class: VideoLlavaForConditionalGeneration (VideoLlava model)
    • VideoMAEConfig configuration class: VideoMAEForPreTraining (VideoMAE model)
    • VipLlavaConfig configuration class: VipLlavaForConditionalGeneration (VipLlava model)
    • VisualBertConfig configuration class: VisualBertForPreTraining (VisualBERT model)
    • VoxtralConfig configuration class: VoxtralForConditionalGeneration (Voxtral model)
    • Wav2Vec2Config configuration class: Wav2Vec2ForPreTraining (Wav2Vec2 model)
    • Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerForPreTraining (Wav2Vec2-Conformer model)
    • XLMConfig configuration class: XLMWithLMHeadModel (XLM model)
    • XLMRobertaConfig configuration class: XLMRobertaForMaskedLM (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLForMaskedLM (XLM-RoBERTa-XL model)
    • XLNetConfig configuration class: XLNetLMHeadModel (XLNet model)
    • XmodConfig configuration class: XmodForMaskedLM (X-MOD model)
    • xLSTMConfig configuration class: xLSTMForCausalLM (xLSTM model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a pretraining head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForPreTraining

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForPreTraining.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertAlbertForPreTraining (ALBERT model)
  • bartBartForConditionalGeneration (BART model)
  • bertBertForPreTraining (BERT model)
  • big_birdBigBirdForPreTraining (BigBird model)
  • bloomBloomForCausalLM (BLOOM model)
  • camembertCamembertForMaskedLM (CamemBERT model)
  • colpaliColPaliForRetrieval (ColPali model)
  • colqwen2ColQwen2ForRetrieval (ColQwen2 model)
  • ctrlCTRLLMHeadModel (CTRL model)
  • data2vec-textData2VecTextForMaskedLM (Data2VecText model)
  • debertaDebertaForMaskedLM (DeBERTa model)
  • deberta-v2DebertaV2ForMaskedLM (DeBERTa-v2 model)
  • distilbertDistilBertForMaskedLM (DistilBERT model)
  • electraElectraForPreTraining (ELECTRA model)
  • ernieErnieForPreTraining (ERNIE model)
  • evollaEvollaForProteinText2Text (Evolla model)
  • exaone4Exaone4ForCausalLM (EXAONE-4.0 model)
  • falcon_mambaFalconMambaForCausalLM (FalconMamba model)
  • flaubertFlaubertWithLMHeadModel (FlauBERT model)
  • flavaFlavaForPreTraining (FLAVA model)
  • florence2Florence2ForConditionalGeneration (Florence2 model)
  • fnetFNetForPreTraining (FNet model)
  • fsmtFSMTForConditionalGeneration (FairSeq Machine-Translation model)
  • funnelFunnelForPreTraining (Funnel Transformer model)
  • gemma3Gemma3ForConditionalGeneration (Gemma3ForConditionalGeneration model)
  • gpt-sw3GPT2LMHeadModel (GPT-Sw3 model)
  • gpt2GPT2LMHeadModel (OpenAI GPT-2 model)
  • gpt_bigcodeGPTBigCodeForCausalLM (GPTBigCode model)
  • gptsan-japaneseGPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
  • hieraHieraForPreTraining (Hiera model)
  • ibertIBertForMaskedLM (I-BERT model)
  • ideficsIdeficsForVisionText2Text (IDEFICS model)
  • idefics2Idefics2ForConditionalGeneration (Idefics2 model)
  • idefics3Idefics3ForConditionalGeneration (Idefics3 model)
  • janusJanusForConditionalGeneration (Janus model)
  • layoutlmLayoutLMForMaskedLM (LayoutLM model)
  • llavaLlavaForConditionalGeneration (LLaVa model)
  • llava_nextLlavaNextForConditionalGeneration (LLaVA-NeXT model)
  • llava_next_videoLlavaNextVideoForConditionalGeneration (LLaVa-NeXT-Video model)
  • llava_onevisionLlavaOnevisionForConditionalGeneration (LLaVA-Onevision model)
  • longformerLongformerForMaskedLM (Longformer model)
  • lukeLukeForMaskedLM (LUKE model)
  • lxmertLxmertForPreTraining (LXMERT model)
  • mambaMambaForCausalLM (Mamba model)
  • mamba2Mamba2ForCausalLM (mamba2 model)
  • megaMegaForMaskedLM (MEGA model)
  • megatron-bertMegatronBertForPreTraining (Megatron-BERT model)
  • mistral3Mistral3ForConditionalGeneration (Mistral3 model)
  • mllamaMllamaForConditionalGeneration (Mllama model)
  • mobilebertMobileBertForPreTraining (MobileBERT model)
  • mpnetMPNetForMaskedLM (MPNet model)
  • mptMptForCausalLM (MPT model)
  • mraMraForMaskedLM (MRA model)
  • mvpMvpForConditionalGeneration (MVP model)
  • nezhaNezhaForPreTraining (Nezha model)
  • nllb-moeNllbMoeForConditionalGeneration (NLLB-MOE model)
  • openai-gptOpenAIGPTLMHeadModel (OpenAI GPT model)
  • paligemmaPaliGemmaForConditionalGeneration (PaliGemma model)
  • qwen2_audioQwen2AudioForConditionalGeneration (Qwen2Audio model)
  • retribertRetriBertModel (RetriBERT model)
  • robertaRobertaForMaskedLM (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertForPreTraining (RoCBert model)
  • rwkvRwkvForCausalLM (RWKV model)
  • splinterSplinterForPreTraining (Splinter model)
  • squeezebertSqueezeBertForMaskedLM (SqueezeBERT model)
  • switch_transformersSwitchTransformersForConditionalGeneration (SwitchTransformers model)
  • t5T5ForConditionalGeneration (T5 model)
  • t5gemmaT5GemmaForConditionalGeneration (T5Gemma model)
  • tapasTapasForMaskedLM (TAPAS model)
  • transfo-xlTransfoXLLMHeadModel (Transformer-XL model)
  • tvltTvltForPreTraining (TVLT model)
  • unispeechUniSpeechForPreTraining (UniSpeech model)
  • unispeech-satUniSpeechSatForPreTraining (UniSpeechSat model)
  • video_llavaVideoLlavaForConditionalGeneration (VideoLlava model)
  • videomaeVideoMAEForPreTraining (VideoMAE model)
  • vipllavaVipLlavaForConditionalGeneration (VipLlava model)
  • visual_bertVisualBertForPreTraining (VisualBERT model)
  • vit_maeViTMAEForPreTraining (ViTMAE model)
  • voxtralVoxtralForConditionalGeneration (Voxtral model)
  • wav2vec2Wav2Vec2ForPreTraining (Wav2Vec2 model)
  • wav2vec2-conformerWav2Vec2ConformerForPreTraining (Wav2Vec2-Conformer model)
  • xlmXLMWithLMHeadModel (XLM model)
  • xlm-robertaXLMRobertaForMaskedLM (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLForMaskedLM (XLM-RoBERTa-XL model)
  • xlnetXLNetLMHeadModel (XLNet model)
  • xlstmxLSTMForCausalLM (xLSTM model)
  • xmodXmodForMaskedLM (X-MOD model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForPreTraining

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForPreTraining.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForPreTraining

class transformers.TFAutoModelForPreTraining

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: TFAlbertForPreTraining (ALBERT model)
    • BartConfig configuration class: TFBartForConditionalGeneration (BART model)
    • BertConfig configuration class: TFBertForPreTraining (BERT model)
    • CTRLConfig configuration class: TFCTRLLMHeadModel (CTRL model)
    • CamembertConfig configuration class: TFCamembertForMaskedLM (CamemBERT model)
    • DistilBertConfig configuration class: TFDistilBertForMaskedLM (DistilBERT model)
    • ElectraConfig configuration class: TFElectraForPreTraining (ELECTRA model)
    • FlaubertConfig configuration class: TFFlaubertWithLMHeadModel (FlauBERT model)
    • FunnelConfig configuration class: TFFunnelForPreTraining (Funnel Transformer model)
    • GPT2Config configuration class: TFGPT2LMHeadModel (OpenAI GPT-2 model)
    • IdeficsConfig configuration class: TFIdeficsForVisionText2Text (IDEFICS model)
    • LayoutLMConfig configuration class: TFLayoutLMForMaskedLM (LayoutLM model)
    • LxmertConfig configuration class: TFLxmertForPreTraining (LXMERT model)
    • MPNetConfig configuration class: TFMPNetForMaskedLM (MPNet model)
    • MobileBertConfig configuration class: TFMobileBertForPreTraining (MobileBERT model)
    • OpenAIGPTConfig configuration class: TFOpenAIGPTLMHeadModel (OpenAI GPT model)
    • RobertaConfig configuration class: TFRobertaForMaskedLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
    • T5Config configuration class: TFT5ForConditionalGeneration (T5 model)
    • TapasConfig configuration class: TFTapasForMaskedLM (TAPAS model)
    • TransfoXLConfig configuration class: TFTransfoXLLMHeadModel (Transformer-XL model)
    • ViTMAEConfig configuration class: TFViTMAEForPreTraining (ViTMAE model)
    • XLMConfig configuration class: TFXLMWithLMHeadModel (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaForMaskedLM (XLM-RoBERTa model)
    • XLNetConfig configuration class: TFXLNetLMHeadModel (XLNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a pretraining head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForPreTraining

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForPreTraining.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertTFAlbertForPreTraining (ALBERT model)
  • bartTFBartForConditionalGeneration (BART model)
  • bertTFBertForPreTraining (BERT model)
  • camembertTFCamembertForMaskedLM (CamemBERT model)
  • ctrlTFCTRLLMHeadModel (CTRL model)
  • distilbertTFDistilBertForMaskedLM (DistilBERT model)
  • electraTFElectraForPreTraining (ELECTRA model)
  • flaubertTFFlaubertWithLMHeadModel (FlauBERT model)
  • funnelTFFunnelForPreTraining (Funnel Transformer model)
  • gpt-sw3TFGPT2LMHeadModel (GPT-Sw3 model)
  • gpt2TFGPT2LMHeadModel (OpenAI GPT-2 model)
  • ideficsTFIdeficsForVisionText2Text (IDEFICS model)
  • layoutlmTFLayoutLMForMaskedLM (LayoutLM model)
  • lxmertTFLxmertForPreTraining (LXMERT model)
  • mobilebertTFMobileBertForPreTraining (MobileBERT model)
  • mpnetTFMPNetForMaskedLM (MPNet model)
  • openai-gptTFOpenAIGPTLMHeadModel (OpenAI GPT model)
  • robertaTFRobertaForMaskedLM (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
  • t5TFT5ForConditionalGeneration (T5 model)
  • tapasTFTapasForMaskedLM (TAPAS model)
  • transfo-xlTFTransfoXLLMHeadModel (Transformer-XL model)
  • vit_maeTFViTMAEForPreTraining (ViTMAE model)
  • xlmTFXLMWithLMHeadModel (XLM model)
  • xlm-robertaTFXLMRobertaForMaskedLM (XLM-RoBERTa model)
  • xlnetTFXLNetLMHeadModel (XLNet model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForPreTraining

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForPreTraining.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForPreTraining

class transformers.FlaxAutoModelForPreTraining

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: FlaxAlbertForPreTraining (ALBERT model)
    • BartConfig configuration class: FlaxBartForConditionalGeneration (BART model)
    • BertConfig configuration class: FlaxBertForPreTraining (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdForPreTraining (BigBird model)
    • ElectraConfig configuration class: FlaxElectraForPreTraining (ELECTRA model)
    • LongT5Config configuration class: FlaxLongT5ForConditionalGeneration (LongT5 model)
    • MBartConfig configuration class: FlaxMBartForConditionalGeneration (mBART model)
    • MT5Config configuration class: FlaxMT5ForConditionalGeneration (MT5 model)
    • RoFormerConfig configuration class: FlaxRoFormerForMaskedLM (RoFormer model)
    • RobertaConfig configuration class: FlaxRobertaForMaskedLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
    • T5Config configuration class: FlaxT5ForConditionalGeneration (T5 model)
    • Wav2Vec2Config configuration class: FlaxWav2Vec2ForPreTraining (Wav2Vec2 model)
    • WhisperConfig configuration class: FlaxWhisperForConditionalGeneration (Whisper model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaForMaskedLM (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a pretraining head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForPreTraining

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForPreTraining.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertFlaxAlbertForPreTraining (ALBERT model)
  • bartFlaxBartForConditionalGeneration (BART model)
  • bertFlaxBertForPreTraining (BERT model)
  • big_birdFlaxBigBirdForPreTraining (BigBird model)
  • electraFlaxElectraForPreTraining (ELECTRA model)
  • longt5FlaxLongT5ForConditionalGeneration (LongT5 model)
  • mbartFlaxMBartForConditionalGeneration (mBART model)
  • mt5FlaxMT5ForConditionalGeneration (MT5 model)
  • robertaFlaxRobertaForMaskedLM (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
  • roformerFlaxRoFormerForMaskedLM (RoFormer model)
  • t5FlaxT5ForConditionalGeneration (T5 model)
  • wav2vec2FlaxWav2Vec2ForPreTraining (Wav2Vec2 model)
  • whisperFlaxWhisperForConditionalGeneration (Whisper model)
  • xlm-robertaFlaxXLMRobertaForMaskedLM (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForPreTraining

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForPreTraining.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

Natural Language Processing

以下の自動クラスは、次の自然言語処理タスクに利用可能です。

AutoModelForCausalLM

class transformers.AutoModelForCausalLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • ApertusConfig configuration class: ApertusForCausalLM (Apertus model)
    • ArceeConfig configuration class: ArceeForCausalLM (Arcee model)
    • AriaTextConfig configuration class: AriaTextForCausalLM (AriaText model)
    • BambaConfig configuration class: BambaForCausalLM (Bamba model)
    • BartConfig configuration class: BartForCausalLM (BART model)
    • BertConfig configuration class: BertLMHeadModel (BERT model)
    • BertGenerationConfig configuration class: BertGenerationDecoder (Bert Generation model)
    • BigBirdConfig configuration class: BigBirdForCausalLM (BigBird model)
    • BigBirdPegasusConfig configuration class: BigBirdPegasusForCausalLM (BigBird-Pegasus model)
    • BioGptConfig configuration class: BioGptForCausalLM (BioGpt model)
    • BitNetConfig configuration class: BitNetForCausalLM (BitNet model)
    • BlenderbotConfig configuration class: BlenderbotForCausalLM (Blenderbot model)
    • BlenderbotSmallConfig configuration class: BlenderbotSmallForCausalLM (BlenderbotSmall model)
    • BloomConfig configuration class: BloomForCausalLM (BLOOM model)
    • CTRLConfig configuration class: CTRLLMHeadModel (CTRL model)
    • CamembertConfig configuration class: CamembertForCausalLM (CamemBERT model)
    • CodeGenConfig configuration class: CodeGenForCausalLM (CodeGen model)
    • Cohere2Config configuration class: Cohere2ForCausalLM (Cohere2 model)
    • CohereConfig configuration class: CohereForCausalLM (Cohere model)
    • CpmAntConfig configuration class: CpmAntForCausalLM (CPM-Ant model)
    • Data2VecTextConfig configuration class: Data2VecTextForCausalLM (Data2VecText model)
    • DbrxConfig configuration class: DbrxForCausalLM (DBRX model)
    • DeepseekV2Config configuration class: DeepseekV2ForCausalLM (DeepSeek-V2 model)
    • DeepseekV3Config configuration class: DeepseekV3ForCausalLM (DeepSeek-V3 model)
    • DiffLlamaConfig configuration class: DiffLlamaForCausalLM (DiffLlama model)
    • DogeConfig configuration class: DogeForCausalLM (Doge model)
    • Dots1Config configuration class: Dots1ForCausalLM (dots1 model)
    • ElectraConfig configuration class: ElectraForCausalLM (ELECTRA model)
    • Emu3Config configuration class: Emu3ForCausalLM (Emu3 model)
    • Ernie4_5Config configuration class: Ernie4_5ForCausalLM (Ernie4_5 model)
    • Ernie4_5_MoeConfig configuration class: Ernie4_5_MoeForCausalLM (Ernie4_5_MoE model)
    • ErnieConfig configuration class: ErnieForCausalLM (ERNIE model)
    • Exaone4Config configuration class: Exaone4ForCausalLM (EXAONE-4.0 model)
    • FalconConfig configuration class: FalconForCausalLM (Falcon model)
    • FalconH1Config configuration class: FalconH1ForCausalLM (FalconH1 model)
    • FalconMambaConfig configuration class: FalconMambaForCausalLM (FalconMamba model)
    • FuyuConfig configuration class: FuyuForCausalLM (Fuyu model)
    • GPT2Config configuration class: GPT2LMHeadModel (OpenAI GPT-2 model)
    • GPTBigCodeConfig configuration class: GPTBigCodeForCausalLM (GPTBigCode model)
    • GPTJConfig configuration class: GPTJForCausalLM (GPT-J model)
    • GPTNeoConfig configuration class: GPTNeoForCausalLM (GPT Neo model)
    • GPTNeoXConfig configuration class: GPTNeoXForCausalLM (GPT NeoX model)
    • GPTNeoXJapaneseConfig configuration class: GPTNeoXJapaneseForCausalLM (GPT NeoX Japanese model)
    • Gemma2Config configuration class: Gemma2ForCausalLM (Gemma2 model)
    • Gemma3Config configuration class: Gemma3ForConditionalGeneration (Gemma3ForConditionalGeneration model)
    • Gemma3TextConfig configuration class: Gemma3ForCausalLM (Gemma3ForCausalLM model)
    • Gemma3nConfig configuration class: Gemma3nForConditionalGeneration (Gemma3nForConditionalGeneration model)
    • Gemma3nTextConfig configuration class: Gemma3nForCausalLM (Gemma3nForCausalLM model)
    • GemmaConfig configuration class: GemmaForCausalLM (Gemma model)
    • GitConfig configuration class: GitForCausalLM (GIT model)
    • Glm4Config configuration class: Glm4ForCausalLM (GLM4 model)
    • Glm4MoeConfig configuration class: Glm4MoeForCausalLM (Glm4MoE model)
    • GlmConfig configuration class: GlmForCausalLM (GLM model)
    • GotOcr2Config configuration class: GotOcr2ForConditionalGeneration (GOT-OCR2 model)
    • GptOssConfig configuration class: GptOssForCausalLM (GptOss model)
    • GraniteConfig configuration class: GraniteForCausalLM (Granite model)
    • GraniteMoeConfig configuration class: GraniteMoeForCausalLM (GraniteMoeMoe model)
    • GraniteMoeHybridConfig configuration class: GraniteMoeHybridForCausalLM (GraniteMoeHybrid model)
    • GraniteMoeSharedConfig configuration class: GraniteMoeSharedForCausalLM (GraniteMoeSharedMoe model)
    • HeliumConfig configuration class: HeliumForCausalLM (Helium model)
    • HunYuanDenseV1Config configuration class: HunYuanDenseV1ForCausalLM (HunYuanDenseV1 model)
    • HunYuanMoEV1Config configuration class: HunYuanMoEV1ForCausalLM (HunYuanMoeV1 model)
    • JambaConfig configuration class: JambaForCausalLM (Jamba model)
    • JetMoeConfig configuration class: JetMoeForCausalLM (JetMoe model)
    • Lfm2Config configuration class: Lfm2ForCausalLM (Lfm2 model)
    • Llama4Config configuration class: Llama4ForCausalLM (Llama4 model)
    • Llama4TextConfig configuration class: Llama4ForCausalLM (Llama4ForCausalLM model)
    • LlamaConfig configuration class: LlamaForCausalLM (LLaMA model)
    • MBartConfig configuration class: MBartForCausalLM (mBART model)
    • Mamba2Config configuration class: Mamba2ForCausalLM (mamba2 model)
    • MambaConfig configuration class: MambaForCausalLM (Mamba model)
    • MarianConfig configuration class: MarianForCausalLM (Marian model)
    • MegaConfig configuration class: MegaForCausalLM (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertForCausalLM (Megatron-BERT model)
    • MiniMaxConfig configuration class: MiniMaxForCausalLM (MiniMax model)
    • MistralConfig configuration class: MistralForCausalLM (Mistral model)
    • MixtralConfig configuration class: MixtralForCausalLM (Mixtral model)
    • MllamaConfig configuration class: MllamaForCausalLM (Mllama model)
    • ModernBertDecoderConfig configuration class: ModernBertDecoderForCausalLM (ModernBertDecoder model)
    • MoshiConfig configuration class: MoshiForCausalLM (Moshi model)
    • MptConfig configuration class: MptForCausalLM (MPT model)
    • MusicgenConfig configuration class: MusicgenForCausalLM (MusicGen model)
    • MusicgenMelodyConfig configuration class: MusicgenMelodyForCausalLM (MusicGen Melody model)
    • MvpConfig configuration class: MvpForCausalLM (MVP model)
    • NemotronConfig configuration class: NemotronForCausalLM (Nemotron model)
    • OPTConfig configuration class: OPTForCausalLM (OPT model)
    • Olmo2Config configuration class: Olmo2ForCausalLM (OLMo2 model)
    • OlmoConfig configuration class: OlmoForCausalLM (OLMo model)
    • OlmoeConfig configuration class: OlmoeForCausalLM (OLMoE model)
    • OpenAIGPTConfig configuration class: OpenAIGPTLMHeadModel (OpenAI GPT model)
    • OpenLlamaConfig configuration class: OpenLlamaForCausalLM (OpenLlama model)
    • PLBartConfig configuration class: PLBartForCausalLM (PLBart model)
    • PegasusConfig configuration class: PegasusForCausalLM (Pegasus model)
    • PersimmonConfig configuration class: PersimmonForCausalLM (Persimmon model)
    • Phi3Config configuration class: Phi3ForCausalLM (Phi3 model)
    • Phi4MultimodalConfig configuration class: Phi4MultimodalForCausalLM (Phi4Multimodal model)
    • PhiConfig configuration class: PhiForCausalLM (Phi model)
    • PhimoeConfig configuration class: PhimoeForCausalLM (Phimoe model)
    • ProphetNetConfig configuration class: ProphetNetForCausalLM (ProphetNet model)
    • QDQBertConfig configuration class: QDQBertLMHeadModel (QDQBert model)
    • Qwen2Config configuration class: Qwen2ForCausalLM (Qwen2 model)
    • Qwen2MoeConfig configuration class: Qwen2MoeForCausalLM (Qwen2MoE model)
    • Qwen3Config configuration class: Qwen3ForCausalLM (Qwen3 model)
    • Qwen3MoeConfig configuration class: Qwen3MoeForCausalLM (Qwen3MoE model)
    • RecurrentGemmaConfig configuration class: RecurrentGemmaForCausalLM (RecurrentGemma model)
    • ReformerConfig configuration class: ReformerModelWithLMHead (Reformer model)
    • RemBertConfig configuration class: RemBertForCausalLM (RemBERT model)
    • RoCBertConfig configuration class: RoCBertForCausalLM (RoCBert model)
    • RoFormerConfig configuration class: RoFormerForCausalLM (RoFormer model)
    • RobertaConfig configuration class: RobertaForCausalLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
    • RwkvConfig configuration class: RwkvForCausalLM (RWKV model)
    • SeedOssConfig configuration class: SeedOssForCausalLM (SeedOss model)
    • SmolLM3Config configuration class: SmolLM3ForCausalLM (SmolLM3 model)
    • Speech2Text2Config configuration class: Speech2Text2ForCausalLM (Speech2Text2 model)
    • StableLmConfig configuration class: StableLmForCausalLM (StableLm model)
    • Starcoder2Config configuration class: Starcoder2ForCausalLM (Starcoder2 model)
    • TrOCRConfig configuration class: TrOCRForCausalLM (TrOCR model)
    • TransfoXLConfig configuration class: TransfoXLLMHeadModel (Transformer-XL model)
    • WhisperConfig configuration class: WhisperForCausalLM (Whisper model)
    • XGLMConfig configuration class: XGLMForCausalLM (XGLM model)
    • XLMConfig configuration class: XLMWithLMHeadModel (XLM model)
    • XLMProphetNetConfig configuration class: XLMProphetNetForCausalLM (XLM-ProphetNet model)
    • XLMRobertaConfig configuration class: XLMRobertaForCausalLM (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLForCausalLM (XLM-RoBERTa-XL model)
    • XLNetConfig configuration class: XLNetLMHeadModel (XLNet model)
    • XmodConfig configuration class: XmodForCausalLM (X-MOD model)
    • Zamba2Config configuration class: Zamba2ForCausalLM (Zamba2 model)
    • ZambaConfig configuration class: ZambaForCausalLM (Zamba model)
    • xLSTMConfig configuration class: xLSTMForCausalLM (xLSTM model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForCausalLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForCausalLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • apertusApertusForCausalLM (Apertus model)
  • arceeArceeForCausalLM (Arcee model)
  • aria_textAriaTextForCausalLM (AriaText model)
  • bambaBambaForCausalLM (Bamba model)
  • bartBartForCausalLM (BART model)
  • bertBertLMHeadModel (BERT model)
  • bert-generationBertGenerationDecoder (Bert Generation model)
  • big_birdBigBirdForCausalLM (BigBird model)
  • bigbird_pegasusBigBirdPegasusForCausalLM (BigBird-Pegasus model)
  • biogptBioGptForCausalLM (BioGpt model)
  • bitnetBitNetForCausalLM (BitNet model)
  • blenderbotBlenderbotForCausalLM (Blenderbot model)
  • blenderbot-smallBlenderbotSmallForCausalLM (BlenderbotSmall model)
  • bloomBloomForCausalLM (BLOOM model)
  • camembertCamembertForCausalLM (CamemBERT model)
  • code_llamaLlamaForCausalLM (CodeLlama model)
  • codegenCodeGenForCausalLM (CodeGen model)
  • cohereCohereForCausalLM (Cohere model)
  • cohere2Cohere2ForCausalLM (Cohere2 model)
  • cpmantCpmAntForCausalLM (CPM-Ant model)
  • ctrlCTRLLMHeadModel (CTRL model)
  • data2vec-textData2VecTextForCausalLM (Data2VecText model)
  • dbrxDbrxForCausalLM (DBRX model)
  • deepseek_v2DeepseekV2ForCausalLM (DeepSeek-V2 model)
  • deepseek_v3DeepseekV3ForCausalLM (DeepSeek-V3 model)
  • diffllamaDiffLlamaForCausalLM (DiffLlama model)
  • dogeDogeForCausalLM (Doge model)
  • dots1Dots1ForCausalLM (dots1 model)
  • electraElectraForCausalLM (ELECTRA model)
  • emu3Emu3ForCausalLM (Emu3 model)
  • ernieErnieForCausalLM (ERNIE model)
  • ernie4_5Ernie4_5ForCausalLM (Ernie4_5 model)
  • ernie4_5_moeErnie4_5_MoeForCausalLM (Ernie4_5_MoE model)
  • exaone4Exaone4ForCausalLM (EXAONE-4.0 model)
  • falconFalconForCausalLM (Falcon model)
  • falcon_h1FalconH1ForCausalLM (FalconH1 model)
  • falcon_mambaFalconMambaForCausalLM (FalconMamba model)
  • fuyuFuyuForCausalLM (Fuyu model)
  • gemmaGemmaForCausalLM (Gemma model)
  • gemma2Gemma2ForCausalLM (Gemma2 model)
  • gemma3Gemma3ForConditionalGeneration (Gemma3ForConditionalGeneration model)
  • gemma3_textGemma3ForCausalLM (Gemma3ForCausalLM model)
  • gemma3nGemma3nForConditionalGeneration (Gemma3nForConditionalGeneration model)
  • gemma3n_textGemma3nForCausalLM (Gemma3nForCausalLM model)
  • gitGitForCausalLM (GIT model)
  • glmGlmForCausalLM (GLM model)
  • glm4Glm4ForCausalLM (GLM4 model)
  • glm4_moeGlm4MoeForCausalLM (Glm4MoE model)
  • got_ocr2GotOcr2ForConditionalGeneration (GOT-OCR2 model)
  • gpt-sw3GPT2LMHeadModel (GPT-Sw3 model)
  • gpt2GPT2LMHeadModel (OpenAI GPT-2 model)
  • gpt_bigcodeGPTBigCodeForCausalLM (GPTBigCode model)
  • gpt_neoGPTNeoForCausalLM (GPT Neo model)
  • gpt_neoxGPTNeoXForCausalLM (GPT NeoX model)
  • gpt_neox_japaneseGPTNeoXJapaneseForCausalLM (GPT NeoX Japanese model)
  • gpt_ossGptOssForCausalLM (GptOss model)
  • gptjGPTJForCausalLM (GPT-J model)
  • graniteGraniteForCausalLM (Granite model)
  • granitemoeGraniteMoeForCausalLM (GraniteMoeMoe model)
  • granitemoehybridGraniteMoeHybridForCausalLM (GraniteMoeHybrid model)
  • granitemoesharedGraniteMoeSharedForCausalLM (GraniteMoeSharedMoe model)
  • heliumHeliumForCausalLM (Helium model)
  • hunyuan_v1_denseHunYuanDenseV1ForCausalLM (HunYuanDenseV1 model)
  • hunyuan_v1_moeHunYuanMoEV1ForCausalLM (HunYuanMoeV1 model)
  • jambaJambaForCausalLM (Jamba model)
  • jetmoeJetMoeForCausalLM (JetMoe model)
  • lfm2Lfm2ForCausalLM (Lfm2 model)
  • llamaLlamaForCausalLM (LLaMA model)
  • llama4Llama4ForCausalLM (Llama4 model)
  • llama4_textLlama4ForCausalLM (Llama4ForCausalLM model)
  • mambaMambaForCausalLM (Mamba model)
  • mamba2Mamba2ForCausalLM (mamba2 model)
  • marianMarianForCausalLM (Marian model)
  • mbartMBartForCausalLM (mBART model)
  • megaMegaForCausalLM (MEGA model)
  • megatron-bertMegatronBertForCausalLM (Megatron-BERT model)
  • minimaxMiniMaxForCausalLM (MiniMax model)
  • mistralMistralForCausalLM (Mistral model)
  • mixtralMixtralForCausalLM (Mixtral model)
  • mllamaMllamaForCausalLM (Mllama model)
  • modernbert-decoderModernBertDecoderForCausalLM (ModernBertDecoder model)
  • moshiMoshiForCausalLM (Moshi model)
  • mptMptForCausalLM (MPT model)
  • musicgenMusicgenForCausalLM (MusicGen model)
  • musicgen_melodyMusicgenMelodyForCausalLM (MusicGen Melody model)
  • mvpMvpForCausalLM (MVP model)
  • nemotronNemotronForCausalLM (Nemotron model)
  • olmoOlmoForCausalLM (OLMo model)
  • olmo2Olmo2ForCausalLM (OLMo2 model)
  • olmoeOlmoeForCausalLM (OLMoE model)
  • open-llamaOpenLlamaForCausalLM (OpenLlama model)
  • openai-gptOpenAIGPTLMHeadModel (OpenAI GPT model)
  • optOPTForCausalLM (OPT model)
  • pegasusPegasusForCausalLM (Pegasus model)
  • persimmonPersimmonForCausalLM (Persimmon model)
  • phiPhiForCausalLM (Phi model)
  • phi3Phi3ForCausalLM (Phi3 model)
  • phi4_multimodalPhi4MultimodalForCausalLM (Phi4Multimodal model)
  • phimoePhimoeForCausalLM (Phimoe model)
  • plbartPLBartForCausalLM (PLBart model)
  • prophetnetProphetNetForCausalLM (ProphetNet model)
  • qdqbertQDQBertLMHeadModel (QDQBert model)
  • qwen2Qwen2ForCausalLM (Qwen2 model)
  • qwen2_moeQwen2MoeForCausalLM (Qwen2MoE model)
  • qwen3Qwen3ForCausalLM (Qwen3 model)
  • qwen3_moeQwen3MoeForCausalLM (Qwen3MoE model)
  • recurrent_gemmaRecurrentGemmaForCausalLM (RecurrentGemma model)
  • reformerReformerModelWithLMHead (Reformer model)
  • rembertRemBertForCausalLM (RemBERT model)
  • robertaRobertaForCausalLM (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertForCausalLM (RoCBert model)
  • roformerRoFormerForCausalLM (RoFormer model)
  • rwkvRwkvForCausalLM (RWKV model)
  • seed_ossSeedOssForCausalLM (SeedOss model)
  • smollm3SmolLM3ForCausalLM (SmolLM3 model)
  • speech_to_text_2Speech2Text2ForCausalLM (Speech2Text2 model)
  • stablelmStableLmForCausalLM (StableLm model)
  • starcoder2Starcoder2ForCausalLM (Starcoder2 model)
  • transfo-xlTransfoXLLMHeadModel (Transformer-XL model)
  • trocrTrOCRForCausalLM (TrOCR model)
  • whisperWhisperForCausalLM (Whisper model)
  • xglmXGLMForCausalLM (XGLM model)
  • xlmXLMWithLMHeadModel (XLM model)
  • xlm-prophetnetXLMProphetNetForCausalLM (XLM-ProphetNet model)
  • xlm-robertaXLMRobertaForCausalLM (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLForCausalLM (XLM-RoBERTa-XL model)
  • xlnetXLNetLMHeadModel (XLNet model)
  • xlstmxLSTMForCausalLM (xLSTM model)
  • xmodXmodForCausalLM (X-MOD model)
  • zambaZambaForCausalLM (Zamba model)
  • zamba2Zamba2ForCausalLM (Zamba2 model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForCausalLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForCausalLM.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForCausalLM

class transformers.TFAutoModelForCausalLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BertConfig configuration class: TFBertLMHeadModel (BERT model)
    • CTRLConfig configuration class: TFCTRLLMHeadModel (CTRL model)
    • CamembertConfig configuration class: TFCamembertForCausalLM (CamemBERT model)
    • GPT2Config configuration class: TFGPT2LMHeadModel (OpenAI GPT-2 model)
    • GPTJConfig configuration class: TFGPTJForCausalLM (GPT-J model)
    • MistralConfig configuration class: TFMistralForCausalLM (Mistral model)
    • OPTConfig configuration class: TFOPTForCausalLM (OPT model)
    • OpenAIGPTConfig configuration class: TFOpenAIGPTLMHeadModel (OpenAI GPT model)
    • RemBertConfig configuration class: TFRemBertForCausalLM (RemBERT model)
    • RoFormerConfig configuration class: TFRoFormerForCausalLM (RoFormer model)
    • RobertaConfig configuration class: TFRobertaForCausalLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
    • TransfoXLConfig configuration class: TFTransfoXLLMHeadModel (Transformer-XL model)
    • XGLMConfig configuration class: TFXGLMForCausalLM (XGLM model)
    • XLMConfig configuration class: TFXLMWithLMHeadModel (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaForCausalLM (XLM-RoBERTa model)
    • XLNetConfig configuration class: TFXLNetLMHeadModel (XLNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForCausalLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForCausalLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • bertTFBertLMHeadModel (BERT model)
  • camembertTFCamembertForCausalLM (CamemBERT model)
  • ctrlTFCTRLLMHeadModel (CTRL model)
  • gpt-sw3TFGPT2LMHeadModel (GPT-Sw3 model)
  • gpt2TFGPT2LMHeadModel (OpenAI GPT-2 model)
  • gptjTFGPTJForCausalLM (GPT-J model)
  • mistralTFMistralForCausalLM (Mistral model)
  • openai-gptTFOpenAIGPTLMHeadModel (OpenAI GPT model)
  • optTFOPTForCausalLM (OPT model)
  • rembertTFRemBertForCausalLM (RemBERT model)
  • robertaTFRobertaForCausalLM (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
  • roformerTFRoFormerForCausalLM (RoFormer model)
  • transfo-xlTFTransfoXLLMHeadModel (Transformer-XL model)
  • xglmTFXGLMForCausalLM (XGLM model)
  • xlmTFXLMWithLMHeadModel (XLM model)
  • xlm-robertaTFXLMRobertaForCausalLM (XLM-RoBERTa model)
  • xlnetTFXLNetLMHeadModel (XLNet model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForCausalLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForCausalLM.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForCausalLM

class transformers.FlaxAutoModelForCausalLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BartConfig configuration class: FlaxBartForCausalLM (BART model)
    • BertConfig configuration class: FlaxBertForCausalLM (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdForCausalLM (BigBird model)
    • BloomConfig configuration class: FlaxBloomForCausalLM (BLOOM model)
    • ElectraConfig configuration class: FlaxElectraForCausalLM (ELECTRA model)
    • GPT2Config configuration class: FlaxGPT2LMHeadModel (OpenAI GPT-2 model)
    • GPTJConfig configuration class: FlaxGPTJForCausalLM (GPT-J model)
    • GPTNeoConfig configuration class: FlaxGPTNeoForCausalLM (GPT Neo model)
    • GemmaConfig configuration class: FlaxGemmaForCausalLM (Gemma model)
    • LlamaConfig configuration class: FlaxLlamaForCausalLM (LLaMA model)
    • MistralConfig configuration class: FlaxMistralForCausalLM (Mistral model)
    • OPTConfig configuration class: FlaxOPTForCausalLM (OPT model)
    • RobertaConfig configuration class: FlaxRobertaForCausalLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
    • XGLMConfig configuration class: FlaxXGLMForCausalLM (XGLM model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaForCausalLM (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForCausalLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForCausalLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • bartFlaxBartForCausalLM (BART model)
  • bertFlaxBertForCausalLM (BERT model)
  • big_birdFlaxBigBirdForCausalLM (BigBird model)
  • bloomFlaxBloomForCausalLM (BLOOM model)
  • electraFlaxElectraForCausalLM (ELECTRA model)
  • gemmaFlaxGemmaForCausalLM (Gemma model)
  • gpt-sw3FlaxGPT2LMHeadModel (GPT-Sw3 model)
  • gpt2FlaxGPT2LMHeadModel (OpenAI GPT-2 model)
  • gpt_neoFlaxGPTNeoForCausalLM (GPT Neo model)
  • gptjFlaxGPTJForCausalLM (GPT-J model)
  • llamaFlaxLlamaForCausalLM (LLaMA model)
  • mistralFlaxMistralForCausalLM (Mistral model)
  • optFlaxOPTForCausalLM (OPT model)
  • robertaFlaxRobertaForCausalLM (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
  • xglmFlaxXGLMForCausalLM (XGLM model)
  • xlm-robertaFlaxXLMRobertaForCausalLM (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForCausalLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForCausalLM.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForMaskedLM

class transformers.AutoModelForMaskedLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a masked language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: AlbertForMaskedLM (ALBERT model)
    • BartConfig configuration class: BartForConditionalGeneration (BART model)
    • BertConfig configuration class: BertForMaskedLM (BERT model)
    • BigBirdConfig configuration class: BigBirdForMaskedLM (BigBird model)
    • CamembertConfig configuration class: CamembertForMaskedLM (CamemBERT model)
    • ConvBertConfig configuration class: ConvBertForMaskedLM (ConvBERT model)
    • Data2VecTextConfig configuration class: Data2VecTextForMaskedLM (Data2VecText model)
    • DebertaConfig configuration class: DebertaForMaskedLM (DeBERTa model)
    • DebertaV2Config configuration class: DebertaV2ForMaskedLM (DeBERTa-v2 model)
    • DistilBertConfig configuration class: DistilBertForMaskedLM (DistilBERT model)
    • ElectraConfig configuration class: ElectraForMaskedLM (ELECTRA model)
    • ErnieConfig configuration class: ErnieForMaskedLM (ERNIE model)
    • EsmConfig configuration class: EsmForMaskedLM (ESM model)
    • FNetConfig configuration class: FNetForMaskedLM (FNet model)
    • FlaubertConfig configuration class: FlaubertWithLMHeadModel (FlauBERT model)
    • FunnelConfig configuration class: FunnelForMaskedLM (Funnel Transformer model)
    • IBertConfig configuration class: IBertForMaskedLM (I-BERT model)
    • LayoutLMConfig configuration class: LayoutLMForMaskedLM (LayoutLM model)
    • LongformerConfig configuration class: LongformerForMaskedLM (Longformer model)
    • LukeConfig configuration class: LukeForMaskedLM (LUKE model)
    • MBartConfig configuration class: MBartForConditionalGeneration (mBART model)
    • MPNetConfig configuration class: MPNetForMaskedLM (MPNet model)
    • MegaConfig configuration class: MegaForMaskedLM (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertForMaskedLM (Megatron-BERT model)
    • MobileBertConfig configuration class: MobileBertForMaskedLM (MobileBERT model)
    • ModernBertConfig configuration class: ModernBertForMaskedLM (ModernBERT model)
    • MraConfig configuration class: MraForMaskedLM (MRA model)
    • MvpConfig configuration class: MvpForConditionalGeneration (MVP model)
    • NezhaConfig configuration class: NezhaForMaskedLM (Nezha model)
    • NystromformerConfig configuration class: NystromformerForMaskedLM (Nyströmformer model)
    • PerceiverConfig configuration class: PerceiverForMaskedLM (Perceiver model)
    • QDQBertConfig configuration class: QDQBertForMaskedLM (QDQBert model)
    • ReformerConfig configuration class: ReformerForMaskedLM (Reformer model)
    • RemBertConfig configuration class: RemBertForMaskedLM (RemBERT model)
    • RoCBertConfig configuration class: RoCBertForMaskedLM (RoCBert model)
    • RoFormerConfig configuration class: RoFormerForMaskedLM (RoFormer model)
    • RobertaConfig configuration class: RobertaForMaskedLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
    • SqueezeBertConfig configuration class: SqueezeBertForMaskedLM (SqueezeBERT model)
    • TapasConfig configuration class: TapasForMaskedLM (TAPAS model)
    • Wav2Vec2Config configuration class: Wav2Vec2ForMaskedLM (Wav2Vec2 model)
    • XLMConfig configuration class: XLMWithLMHeadModel (XLM model)
    • XLMRobertaConfig configuration class: XLMRobertaForMaskedLM (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLForMaskedLM (XLM-RoBERTa-XL model)
    • XmodConfig configuration class: XmodForMaskedLM (X-MOD model)
    • YosoConfig configuration class: YosoForMaskedLM (YOSO model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a masked language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForMaskedLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForMaskedLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a masked language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertAlbertForMaskedLM (ALBERT model)
  • bartBartForConditionalGeneration (BART model)
  • bertBertForMaskedLM (BERT model)
  • big_birdBigBirdForMaskedLM (BigBird model)
  • camembertCamembertForMaskedLM (CamemBERT model)
  • convbertConvBertForMaskedLM (ConvBERT model)
  • data2vec-textData2VecTextForMaskedLM (Data2VecText model)
  • debertaDebertaForMaskedLM (DeBERTa model)
  • deberta-v2DebertaV2ForMaskedLM (DeBERTa-v2 model)
  • distilbertDistilBertForMaskedLM (DistilBERT model)
  • electraElectraForMaskedLM (ELECTRA model)
  • ernieErnieForMaskedLM (ERNIE model)
  • esmEsmForMaskedLM (ESM model)
  • flaubertFlaubertWithLMHeadModel (FlauBERT model)
  • fnetFNetForMaskedLM (FNet model)
  • funnelFunnelForMaskedLM (Funnel Transformer model)
  • ibertIBertForMaskedLM (I-BERT model)
  • layoutlmLayoutLMForMaskedLM (LayoutLM model)
  • longformerLongformerForMaskedLM (Longformer model)
  • lukeLukeForMaskedLM (LUKE model)
  • mbartMBartForConditionalGeneration (mBART model)
  • megaMegaForMaskedLM (MEGA model)
  • megatron-bertMegatronBertForMaskedLM (Megatron-BERT model)
  • mobilebertMobileBertForMaskedLM (MobileBERT model)
  • modernbertModernBertForMaskedLM (ModernBERT model)
  • mpnetMPNetForMaskedLM (MPNet model)
  • mraMraForMaskedLM (MRA model)
  • mvpMvpForConditionalGeneration (MVP model)
  • nezhaNezhaForMaskedLM (Nezha model)
  • nystromformerNystromformerForMaskedLM (Nyströmformer model)
  • perceiverPerceiverForMaskedLM (Perceiver model)
  • qdqbertQDQBertForMaskedLM (QDQBert model)
  • reformerReformerForMaskedLM (Reformer model)
  • rembertRemBertForMaskedLM (RemBERT model)
  • robertaRobertaForMaskedLM (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertForMaskedLM (RoCBert model)
  • roformerRoFormerForMaskedLM (RoFormer model)
  • squeezebertSqueezeBertForMaskedLM (SqueezeBERT model)
  • tapasTapasForMaskedLM (TAPAS model)
  • wav2vec2Wav2Vec2ForMaskedLM (Wav2Vec2 model)
  • xlmXLMWithLMHeadModel (XLM model)
  • xlm-robertaXLMRobertaForMaskedLM (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLForMaskedLM (XLM-RoBERTa-XL model)
  • xmodXmodForMaskedLM (X-MOD model)
  • yosoYosoForMaskedLM (YOSO model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForMaskedLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForMaskedLM.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForMaskedLM

class transformers.TFAutoModelForMaskedLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a masked language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: TFAlbertForMaskedLM (ALBERT model)
    • BertConfig configuration class: TFBertForMaskedLM (BERT model)
    • CamembertConfig configuration class: TFCamembertForMaskedLM (CamemBERT model)
    • ConvBertConfig configuration class: TFConvBertForMaskedLM (ConvBERT model)
    • DebertaConfig configuration class: TFDebertaForMaskedLM (DeBERTa model)
    • DebertaV2Config configuration class: TFDebertaV2ForMaskedLM (DeBERTa-v2 model)
    • DistilBertConfig configuration class: TFDistilBertForMaskedLM (DistilBERT model)
    • ElectraConfig configuration class: TFElectraForMaskedLM (ELECTRA model)
    • EsmConfig configuration class: TFEsmForMaskedLM (ESM model)
    • FlaubertConfig configuration class: TFFlaubertWithLMHeadModel (FlauBERT model)
    • FunnelConfig configuration class: TFFunnelForMaskedLM (Funnel Transformer model)
    • LayoutLMConfig configuration class: TFLayoutLMForMaskedLM (LayoutLM model)
    • LongformerConfig configuration class: TFLongformerForMaskedLM (Longformer model)
    • MPNetConfig configuration class: TFMPNetForMaskedLM (MPNet model)
    • MobileBertConfig configuration class: TFMobileBertForMaskedLM (MobileBERT model)
    • RemBertConfig configuration class: TFRemBertForMaskedLM (RemBERT model)
    • RoFormerConfig configuration class: TFRoFormerForMaskedLM (RoFormer model)
    • RobertaConfig configuration class: TFRobertaForMaskedLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
    • TapasConfig configuration class: TFTapasForMaskedLM (TAPAS model)
    • XLMConfig configuration class: TFXLMWithLMHeadModel (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaForMaskedLM (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a masked language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForMaskedLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForMaskedLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a masked language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertTFAlbertForMaskedLM (ALBERT model)
  • bertTFBertForMaskedLM (BERT model)
  • camembertTFCamembertForMaskedLM (CamemBERT model)
  • convbertTFConvBertForMaskedLM (ConvBERT model)
  • debertaTFDebertaForMaskedLM (DeBERTa model)
  • deberta-v2TFDebertaV2ForMaskedLM (DeBERTa-v2 model)
  • distilbertTFDistilBertForMaskedLM (DistilBERT model)
  • electraTFElectraForMaskedLM (ELECTRA model)
  • esmTFEsmForMaskedLM (ESM model)
  • flaubertTFFlaubertWithLMHeadModel (FlauBERT model)
  • funnelTFFunnelForMaskedLM (Funnel Transformer model)
  • layoutlmTFLayoutLMForMaskedLM (LayoutLM model)
  • longformerTFLongformerForMaskedLM (Longformer model)
  • mobilebertTFMobileBertForMaskedLM (MobileBERT model)
  • mpnetTFMPNetForMaskedLM (MPNet model)
  • rembertTFRemBertForMaskedLM (RemBERT model)
  • robertaTFRobertaForMaskedLM (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
  • roformerTFRoFormerForMaskedLM (RoFormer model)
  • tapasTFTapasForMaskedLM (TAPAS model)
  • xlmTFXLMWithLMHeadModel (XLM model)
  • xlm-robertaTFXLMRobertaForMaskedLM (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForMaskedLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForMaskedLM.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForMaskedLM

class transformers.FlaxAutoModelForMaskedLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a masked language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: FlaxAlbertForMaskedLM (ALBERT model)
    • BartConfig configuration class: FlaxBartForConditionalGeneration (BART model)
    • BertConfig configuration class: FlaxBertForMaskedLM (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdForMaskedLM (BigBird model)
    • DistilBertConfig configuration class: FlaxDistilBertForMaskedLM (DistilBERT model)
    • ElectraConfig configuration class: FlaxElectraForMaskedLM (ELECTRA model)
    • MBartConfig configuration class: FlaxMBartForConditionalGeneration (mBART model)
    • RoFormerConfig configuration class: FlaxRoFormerForMaskedLM (RoFormer model)
    • RobertaConfig configuration class: FlaxRobertaForMaskedLM (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaForMaskedLM (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a masked language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForMaskedLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForMaskedLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a masked language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertFlaxAlbertForMaskedLM (ALBERT model)
  • bartFlaxBartForConditionalGeneration (BART model)
  • bertFlaxBertForMaskedLM (BERT model)
  • big_birdFlaxBigBirdForMaskedLM (BigBird model)
  • distilbertFlaxDistilBertForMaskedLM (DistilBERT model)
  • electraFlaxElectraForMaskedLM (ELECTRA model)
  • mbartFlaxMBartForConditionalGeneration (mBART model)
  • robertaFlaxRobertaForMaskedLM (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
  • roformerFlaxRoFormerForMaskedLM (RoFormer model)
  • xlm-robertaFlaxXLMRobertaForMaskedLM (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForMaskedLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForMaskedLM.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForMaskGeneration

class transformers.AutoModelForMaskGeneration

< >

( *args **kwargs )

TFAutoModelForMaskGeneration

class transformers.TFAutoModelForMaskGeneration

< >

( *args **kwargs )

AutoModelForSeq2SeqLM

class transformers.AutoModelForSeq2SeqLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BartConfig configuration class: BartForConditionalGeneration (BART model)
    • BigBirdPegasusConfig configuration class: BigBirdPegasusForConditionalGeneration (BigBird-Pegasus model)
    • BlenderbotConfig configuration class: BlenderbotForConditionalGeneration (Blenderbot model)
    • BlenderbotSmallConfig configuration class: BlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
    • EncoderDecoderConfig configuration class: EncoderDecoderModel (Encoder decoder model)
    • FSMTConfig configuration class: FSMTForConditionalGeneration (FairSeq Machine-Translation model)
    • GPTSanJapaneseConfig configuration class: GPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
    • GraniteSpeechConfig configuration class: GraniteSpeechForConditionalGeneration (GraniteSpeech model)
    • LEDConfig configuration class: LEDForConditionalGeneration (LED model)
    • LongT5Config configuration class: LongT5ForConditionalGeneration (LongT5 model)
    • M2M100Config configuration class: M2M100ForConditionalGeneration (M2M100 model)
    • MBartConfig configuration class: MBartForConditionalGeneration (mBART model)
    • MT5Config configuration class: MT5ForConditionalGeneration (MT5 model)
    • MarianConfig configuration class: MarianMTModel (Marian model)
    • MvpConfig configuration class: MvpForConditionalGeneration (MVP model)
    • NllbMoeConfig configuration class: NllbMoeForConditionalGeneration (NLLB-MOE model)
    • PLBartConfig configuration class: PLBartForConditionalGeneration (PLBart model)
    • PegasusConfig configuration class: PegasusForConditionalGeneration (Pegasus model)
    • PegasusXConfig configuration class: PegasusXForConditionalGeneration (PEGASUS-X model)
    • ProphetNetConfig configuration class: ProphetNetForConditionalGeneration (ProphetNet model)
    • Qwen2AudioConfig configuration class: Qwen2AudioForConditionalGeneration (Qwen2Audio model)
    • SeamlessM4TConfig configuration class: SeamlessM4TForTextToText (SeamlessM4T model)
    • SeamlessM4Tv2Config configuration class: SeamlessM4Tv2ForTextToText (SeamlessM4Tv2 model)
    • SwitchTransformersConfig configuration class: SwitchTransformersForConditionalGeneration (SwitchTransformers model)
    • T5Config configuration class: T5ForConditionalGeneration (T5 model)
    • T5GemmaConfig configuration class: T5GemmaForConditionalGeneration (T5Gemma model)
    • UMT5Config configuration class: UMT5ForConditionalGeneration (UMT5 model)
    • VoxtralConfig configuration class: VoxtralForConditionalGeneration (Voxtral model)
    • XLMProphetNetConfig configuration class: XLMProphetNetForConditionalGeneration (XLM-ProphetNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence-to-sequence language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForSeq2SeqLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-t5/t5-base")
>>> model = AutoModelForSeq2SeqLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence-to-sequence language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • bartBartForConditionalGeneration (BART model)
  • bigbird_pegasusBigBirdPegasusForConditionalGeneration (BigBird-Pegasus model)
  • blenderbotBlenderbotForConditionalGeneration (Blenderbot model)
  • blenderbot-smallBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
  • encoder-decoderEncoderDecoderModel (Encoder decoder model)
  • fsmtFSMTForConditionalGeneration (FairSeq Machine-Translation model)
  • gptsan-japaneseGPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
  • granite_speechGraniteSpeechForConditionalGeneration (GraniteSpeech model)
  • ledLEDForConditionalGeneration (LED model)
  • longt5LongT5ForConditionalGeneration (LongT5 model)
  • m2m_100M2M100ForConditionalGeneration (M2M100 model)
  • marianMarianMTModel (Marian model)
  • mbartMBartForConditionalGeneration (mBART model)
  • mt5MT5ForConditionalGeneration (MT5 model)
  • mvpMvpForConditionalGeneration (MVP model)
  • nllb-moeNllbMoeForConditionalGeneration (NLLB-MOE model)
  • pegasusPegasusForConditionalGeneration (Pegasus model)
  • pegasus_xPegasusXForConditionalGeneration (PEGASUS-X model)
  • plbartPLBartForConditionalGeneration (PLBart model)
  • prophetnetProphetNetForConditionalGeneration (ProphetNet model)
  • qwen2_audioQwen2AudioForConditionalGeneration (Qwen2Audio model)
  • seamless_m4tSeamlessM4TForTextToText (SeamlessM4T model)
  • seamless_m4t_v2SeamlessM4Tv2ForTextToText (SeamlessM4Tv2 model)
  • switch_transformersSwitchTransformersForConditionalGeneration (SwitchTransformers model)
  • t5T5ForConditionalGeneration (T5 model)
  • t5gemmaT5GemmaForConditionalGeneration (T5Gemma model)
  • umt5UMT5ForConditionalGeneration (UMT5 model)
  • voxtralVoxtralForConditionalGeneration (Voxtral model)
  • xlm-prophetnetXLMProphetNetForConditionalGeneration (XLM-ProphetNet model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForSeq2SeqLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")

>>> # Update configuration during loading
>>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/t5_tf_model_config.json")
>>> model = AutoModelForSeq2SeqLM.from_pretrained(
...     "./tf_model/t5_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForSeq2SeqLM

class transformers.TFAutoModelForSeq2SeqLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BartConfig configuration class: TFBartForConditionalGeneration (BART model)
    • BlenderbotConfig configuration class: TFBlenderbotForConditionalGeneration (Blenderbot model)
    • BlenderbotSmallConfig configuration class: TFBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
    • EncoderDecoderConfig configuration class: TFEncoderDecoderModel (Encoder decoder model)
    • LEDConfig configuration class: TFLEDForConditionalGeneration (LED model)
    • MBartConfig configuration class: TFMBartForConditionalGeneration (mBART model)
    • MT5Config configuration class: TFMT5ForConditionalGeneration (MT5 model)
    • MarianConfig configuration class: TFMarianMTModel (Marian model)
    • PegasusConfig configuration class: TFPegasusForConditionalGeneration (Pegasus model)
    • T5Config configuration class: TFT5ForConditionalGeneration (T5 model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence-to-sequence language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSeq2SeqLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-t5/t5-base")
>>> model = TFAutoModelForSeq2SeqLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence-to-sequence language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • bartTFBartForConditionalGeneration (BART model)
  • blenderbotTFBlenderbotForConditionalGeneration (Blenderbot model)
  • blenderbot-smallTFBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
  • encoder-decoderTFEncoderDecoderModel (Encoder decoder model)
  • ledTFLEDForConditionalGeneration (LED model)
  • marianTFMarianMTModel (Marian model)
  • mbartTFMBartForConditionalGeneration (mBART model)
  • mt5TFMT5ForConditionalGeneration (MT5 model)
  • pegasusTFPegasusForConditionalGeneration (Pegasus model)
  • t5TFT5ForConditionalGeneration (T5 model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSeq2SeqLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")

>>> # Update configuration during loading
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/t5_pt_model_config.json")
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(
...     "./pt_model/t5_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForSeq2SeqLM

class transformers.FlaxAutoModelForSeq2SeqLM

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence language modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BartConfig configuration class: FlaxBartForConditionalGeneration (BART model)
    • BlenderbotConfig configuration class: FlaxBlenderbotForConditionalGeneration (Blenderbot model)
    • BlenderbotSmallConfig configuration class: FlaxBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
    • EncoderDecoderConfig configuration class: FlaxEncoderDecoderModel (Encoder decoder model)
    • LongT5Config configuration class: FlaxLongT5ForConditionalGeneration (LongT5 model)
    • MBartConfig configuration class: FlaxMBartForConditionalGeneration (mBART model)
    • MT5Config configuration class: FlaxMT5ForConditionalGeneration (MT5 model)
    • MarianConfig configuration class: FlaxMarianMTModel (Marian model)
    • PegasusConfig configuration class: FlaxPegasusForConditionalGeneration (Pegasus model)
    • T5Config configuration class: FlaxT5ForConditionalGeneration (T5 model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence-to-sequence language modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForSeq2SeqLM

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-t5/t5-base")
>>> model = FlaxAutoModelForSeq2SeqLM.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence-to-sequence language modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • bartFlaxBartForConditionalGeneration (BART model)
  • blenderbotFlaxBlenderbotForConditionalGeneration (Blenderbot model)
  • blenderbot-smallFlaxBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
  • encoder-decoderFlaxEncoderDecoderModel (Encoder decoder model)
  • longt5FlaxLongT5ForConditionalGeneration (LongT5 model)
  • marianFlaxMarianMTModel (Marian model)
  • mbartFlaxMBartForConditionalGeneration (mBART model)
  • mt5FlaxMT5ForConditionalGeneration (MT5 model)
  • pegasusFlaxPegasusForConditionalGeneration (Pegasus model)
  • t5FlaxT5ForConditionalGeneration (T5 model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForSeq2SeqLM

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/t5_pt_model_config.json")
>>> model = FlaxAutoModelForSeq2SeqLM.from_pretrained(
...     "./pt_model/t5_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForSequenceClassification

class transformers.AutoModelForSequenceClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: AlbertForSequenceClassification (ALBERT model)
    • ArceeConfig configuration class: ArceeForSequenceClassification (Arcee model)
    • BartConfig configuration class: BartForSequenceClassification (BART model)
    • BertConfig configuration class: BertForSequenceClassification (BERT model)
    • BigBirdConfig configuration class: BigBirdForSequenceClassification (BigBird model)
    • BigBirdPegasusConfig configuration class: BigBirdPegasusForSequenceClassification (BigBird-Pegasus model)
    • BioGptConfig configuration class: BioGptForSequenceClassification (BioGpt model)
    • BloomConfig configuration class: BloomForSequenceClassification (BLOOM model)
    • CTRLConfig configuration class: CTRLForSequenceClassification (CTRL model)
    • CamembertConfig configuration class: CamembertForSequenceClassification (CamemBERT model)
    • CanineConfig configuration class: CanineForSequenceClassification (CANINE model)
    • ConvBertConfig configuration class: ConvBertForSequenceClassification (ConvBERT model)
    • Data2VecTextConfig configuration class: Data2VecTextForSequenceClassification (Data2VecText model)
    • DebertaConfig configuration class: DebertaForSequenceClassification (DeBERTa model)
    • DebertaV2Config configuration class: DebertaV2ForSequenceClassification (DeBERTa-v2 model)
    • DeepseekV2Config configuration class: DeepseekV2ForSequenceClassification (DeepSeek-V2 model)
    • DeepseekV3Config configuration class: DeepseekV3ForSequenceClassification (DeepSeek-V3 model)
    • DiffLlamaConfig configuration class: DiffLlamaForSequenceClassification (DiffLlama model)
    • DistilBertConfig configuration class: DistilBertForSequenceClassification (DistilBERT model)
    • DogeConfig configuration class: DogeForSequenceClassification (Doge model)
    • ElectraConfig configuration class: ElectraForSequenceClassification (ELECTRA model)
    • ErnieConfig configuration class: ErnieForSequenceClassification (ERNIE model)
    • ErnieMConfig configuration class: ErnieMForSequenceClassification (ErnieM model)
    • EsmConfig configuration class: EsmForSequenceClassification (ESM model)
    • Exaone4Config configuration class: Exaone4ForSequenceClassification (EXAONE-4.0 model)
    • FNetConfig configuration class: FNetForSequenceClassification (FNet model)
    • FalconConfig configuration class: FalconForSequenceClassification (Falcon model)
    • FlaubertConfig configuration class: FlaubertForSequenceClassification (FlauBERT model)
    • FunnelConfig configuration class: FunnelForSequenceClassification (Funnel Transformer model)
    • GPT2Config configuration class: GPT2ForSequenceClassification (OpenAI GPT-2 model)
    • GPTBigCodeConfig configuration class: GPTBigCodeForSequenceClassification (GPTBigCode model)
    • GPTJConfig configuration class: GPTJForSequenceClassification (GPT-J model)
    • GPTNeoConfig configuration class: GPTNeoForSequenceClassification (GPT Neo model)
    • GPTNeoXConfig configuration class: GPTNeoXForSequenceClassification (GPT NeoX model)
    • Gemma2Config configuration class: Gemma2ForSequenceClassification (Gemma2 model)
    • Gemma3Config configuration class: Gemma3ForSequenceClassification (Gemma3ForConditionalGeneration model)
    • GemmaConfig configuration class: GemmaForSequenceClassification (Gemma model)
    • Glm4Config configuration class: Glm4ForSequenceClassification (GLM4 model)
    • GlmConfig configuration class: GlmForSequenceClassification (GLM model)
    • GptOssConfig configuration class: GptOssForSequenceClassification (GptOss model)
    • HeliumConfig configuration class: HeliumForSequenceClassification (Helium model)
    • HunYuanDenseV1Config configuration class: HunYuanDenseV1ForSequenceClassification (HunYuanDenseV1 model)
    • HunYuanMoEV1Config configuration class: HunYuanMoEV1ForSequenceClassification (HunYuanMoeV1 model)
    • IBertConfig configuration class: IBertForSequenceClassification (I-BERT model)
    • JambaConfig configuration class: JambaForSequenceClassification (Jamba model)
    • JetMoeConfig configuration class: JetMoeForSequenceClassification (JetMoe model)
    • LEDConfig configuration class: LEDForSequenceClassification (LED model)
    • LayoutLMConfig configuration class: LayoutLMForSequenceClassification (LayoutLM model)
    • LayoutLMv2Config configuration class: LayoutLMv2ForSequenceClassification (LayoutLMv2 model)
    • LayoutLMv3Config configuration class: LayoutLMv3ForSequenceClassification (LayoutLMv3 model)
    • LiltConfig configuration class: LiltForSequenceClassification (LiLT model)
    • LlamaConfig configuration class: LlamaForSequenceClassification (LLaMA model)
    • LongformerConfig configuration class: LongformerForSequenceClassification (Longformer model)
    • LukeConfig configuration class: LukeForSequenceClassification (LUKE model)
    • MBartConfig configuration class: MBartForSequenceClassification (mBART model)
    • MPNetConfig configuration class: MPNetForSequenceClassification (MPNet model)
    • MT5Config configuration class: MT5ForSequenceClassification (MT5 model)
    • MarkupLMConfig configuration class: MarkupLMForSequenceClassification (MarkupLM model)
    • MegaConfig configuration class: MegaForSequenceClassification (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertForSequenceClassification (Megatron-BERT model)
    • MiniMaxConfig configuration class: MiniMaxForSequenceClassification (MiniMax model)
    • MistralConfig configuration class: MistralForSequenceClassification (Mistral model)
    • MixtralConfig configuration class: MixtralForSequenceClassification (Mixtral model)
    • MobileBertConfig configuration class: MobileBertForSequenceClassification (MobileBERT model)
    • ModernBertConfig configuration class: ModernBertForSequenceClassification (ModernBERT model)
    • ModernBertDecoderConfig configuration class: ModernBertDecoderForSequenceClassification (ModernBertDecoder model)
    • MptConfig configuration class: MptForSequenceClassification (MPT model)
    • MraConfig configuration class: MraForSequenceClassification (MRA model)
    • MvpConfig configuration class: MvpForSequenceClassification (MVP model)
    • NemotronConfig configuration class: NemotronForSequenceClassification (Nemotron model)
    • NezhaConfig configuration class: NezhaForSequenceClassification (Nezha model)
    • NystromformerConfig configuration class: NystromformerForSequenceClassification (Nyströmformer model)
    • OPTConfig configuration class: OPTForSequenceClassification (OPT model)
    • OpenAIGPTConfig configuration class: OpenAIGPTForSequenceClassification (OpenAI GPT model)
    • OpenLlamaConfig configuration class: OpenLlamaForSequenceClassification (OpenLlama model)
    • PLBartConfig configuration class: PLBartForSequenceClassification (PLBart model)
    • PerceiverConfig configuration class: PerceiverForSequenceClassification (Perceiver model)
    • PersimmonConfig configuration class: PersimmonForSequenceClassification (Persimmon model)
    • Phi3Config configuration class: Phi3ForSequenceClassification (Phi3 model)
    • PhiConfig configuration class: PhiForSequenceClassification (Phi model)
    • PhimoeConfig configuration class: PhimoeForSequenceClassification (Phimoe model)
    • QDQBertConfig configuration class: QDQBertForSequenceClassification (QDQBert model)
    • Qwen2Config configuration class: Qwen2ForSequenceClassification (Qwen2 model)
    • Qwen2MoeConfig configuration class: Qwen2MoeForSequenceClassification (Qwen2MoE model)
    • Qwen3Config configuration class: Qwen3ForSequenceClassification (Qwen3 model)
    • Qwen3MoeConfig configuration class: Qwen3MoeForSequenceClassification (Qwen3MoE model)
    • ReformerConfig configuration class: ReformerForSequenceClassification (Reformer model)
    • RemBertConfig configuration class: RemBertForSequenceClassification (RemBERT model)
    • RoCBertConfig configuration class: RoCBertForSequenceClassification (RoCBert model)
    • RoFormerConfig configuration class: RoFormerForSequenceClassification (RoFormer model)
    • RobertaConfig configuration class: RobertaForSequenceClassification (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForSequenceClassification (RoBERTa-PreLayerNorm model)
    • SeedOssConfig configuration class: SeedOssForSequenceClassification (SeedOss model)
    • SmolLM3Config configuration class: SmolLM3ForSequenceClassification (SmolLM3 model)
    • SqueezeBertConfig configuration class: SqueezeBertForSequenceClassification (SqueezeBERT model)
    • StableLmConfig configuration class: StableLmForSequenceClassification (StableLm model)
    • Starcoder2Config configuration class: Starcoder2ForSequenceClassification (Starcoder2 model)
    • T5Config configuration class: T5ForSequenceClassification (T5 model)
    • T5GemmaConfig configuration class: T5GemmaForSequenceClassification (T5Gemma model)
    • TapasConfig configuration class: TapasForSequenceClassification (TAPAS model)
    • TransfoXLConfig configuration class: TransfoXLForSequenceClassification (Transformer-XL model)
    • UMT5Config configuration class: UMT5ForSequenceClassification (UMT5 model)
    • XLMConfig configuration class: XLMForSequenceClassification (XLM model)
    • XLMRobertaConfig configuration class: XLMRobertaForSequenceClassification (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLForSequenceClassification (XLM-RoBERTa-XL model)
    • XLNetConfig configuration class: XLNetForSequenceClassification (XLNet model)
    • XmodConfig configuration class: XmodForSequenceClassification (X-MOD model)
    • YosoConfig configuration class: YosoForSequenceClassification (YOSO model)
    • Zamba2Config configuration class: Zamba2ForSequenceClassification (Zamba2 model)
    • ZambaConfig configuration class: ZambaForSequenceClassification (Zamba model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForSequenceClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForSequenceClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertAlbertForSequenceClassification (ALBERT model)
  • arceeArceeForSequenceClassification (Arcee model)
  • bartBartForSequenceClassification (BART model)
  • bertBertForSequenceClassification (BERT model)
  • big_birdBigBirdForSequenceClassification (BigBird model)
  • bigbird_pegasusBigBirdPegasusForSequenceClassification (BigBird-Pegasus model)
  • biogptBioGptForSequenceClassification (BioGpt model)
  • bloomBloomForSequenceClassification (BLOOM model)
  • camembertCamembertForSequenceClassification (CamemBERT model)
  • canineCanineForSequenceClassification (CANINE model)
  • code_llamaLlamaForSequenceClassification (CodeLlama model)
  • convbertConvBertForSequenceClassification (ConvBERT model)
  • ctrlCTRLForSequenceClassification (CTRL model)
  • data2vec-textData2VecTextForSequenceClassification (Data2VecText model)
  • debertaDebertaForSequenceClassification (DeBERTa model)
  • deberta-v2DebertaV2ForSequenceClassification (DeBERTa-v2 model)
  • deepseek_v2DeepseekV2ForSequenceClassification (DeepSeek-V2 model)
  • deepseek_v3DeepseekV3ForSequenceClassification (DeepSeek-V3 model)
  • diffllamaDiffLlamaForSequenceClassification (DiffLlama model)
  • distilbertDistilBertForSequenceClassification (DistilBERT model)
  • dogeDogeForSequenceClassification (Doge model)
  • electraElectraForSequenceClassification (ELECTRA model)
  • ernieErnieForSequenceClassification (ERNIE model)
  • ernie_mErnieMForSequenceClassification (ErnieM model)
  • esmEsmForSequenceClassification (ESM model)
  • exaone4Exaone4ForSequenceClassification (EXAONE-4.0 model)
  • falconFalconForSequenceClassification (Falcon model)
  • flaubertFlaubertForSequenceClassification (FlauBERT model)
  • fnetFNetForSequenceClassification (FNet model)
  • funnelFunnelForSequenceClassification (Funnel Transformer model)
  • gemmaGemmaForSequenceClassification (Gemma model)
  • gemma2Gemma2ForSequenceClassification (Gemma2 model)
  • gemma3Gemma3ForSequenceClassification (Gemma3ForConditionalGeneration model)
  • glmGlmForSequenceClassification (GLM model)
  • glm4Glm4ForSequenceClassification (GLM4 model)
  • gpt-sw3GPT2ForSequenceClassification (GPT-Sw3 model)
  • gpt2GPT2ForSequenceClassification (OpenAI GPT-2 model)
  • gpt_bigcodeGPTBigCodeForSequenceClassification (GPTBigCode model)
  • gpt_neoGPTNeoForSequenceClassification (GPT Neo model)
  • gpt_neoxGPTNeoXForSequenceClassification (GPT NeoX model)
  • gpt_ossGptOssForSequenceClassification (GptOss model)
  • gptjGPTJForSequenceClassification (GPT-J model)
  • heliumHeliumForSequenceClassification (Helium model)
  • hunyuan_v1_denseHunYuanDenseV1ForSequenceClassification (HunYuanDenseV1 model)
  • hunyuan_v1_moeHunYuanMoEV1ForSequenceClassification (HunYuanMoeV1 model)
  • ibertIBertForSequenceClassification (I-BERT model)
  • jambaJambaForSequenceClassification (Jamba model)
  • jetmoeJetMoeForSequenceClassification (JetMoe model)
  • layoutlmLayoutLMForSequenceClassification (LayoutLM model)
  • layoutlmv2LayoutLMv2ForSequenceClassification (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3ForSequenceClassification (LayoutLMv3 model)
  • ledLEDForSequenceClassification (LED model)
  • liltLiltForSequenceClassification (LiLT model)
  • llamaLlamaForSequenceClassification (LLaMA model)
  • longformerLongformerForSequenceClassification (Longformer model)
  • lukeLukeForSequenceClassification (LUKE model)
  • markuplmMarkupLMForSequenceClassification (MarkupLM model)
  • mbartMBartForSequenceClassification (mBART model)
  • megaMegaForSequenceClassification (MEGA model)
  • megatron-bertMegatronBertForSequenceClassification (Megatron-BERT model)
  • minimaxMiniMaxForSequenceClassification (MiniMax model)
  • mistralMistralForSequenceClassification (Mistral model)
  • mixtralMixtralForSequenceClassification (Mixtral model)
  • mobilebertMobileBertForSequenceClassification (MobileBERT model)
  • modernbertModernBertForSequenceClassification (ModernBERT model)
  • modernbert-decoderModernBertDecoderForSequenceClassification (ModernBertDecoder model)
  • mpnetMPNetForSequenceClassification (MPNet model)
  • mptMptForSequenceClassification (MPT model)
  • mraMraForSequenceClassification (MRA model)
  • mt5MT5ForSequenceClassification (MT5 model)
  • mvpMvpForSequenceClassification (MVP model)
  • nemotronNemotronForSequenceClassification (Nemotron model)
  • nezhaNezhaForSequenceClassification (Nezha model)
  • nystromformerNystromformerForSequenceClassification (Nyströmformer model)
  • open-llamaOpenLlamaForSequenceClassification (OpenLlama model)
  • openai-gptOpenAIGPTForSequenceClassification (OpenAI GPT model)
  • optOPTForSequenceClassification (OPT model)
  • perceiverPerceiverForSequenceClassification (Perceiver model)
  • persimmonPersimmonForSequenceClassification (Persimmon model)
  • phiPhiForSequenceClassification (Phi model)
  • phi3Phi3ForSequenceClassification (Phi3 model)
  • phimoePhimoeForSequenceClassification (Phimoe model)
  • plbartPLBartForSequenceClassification (PLBart model)
  • qdqbertQDQBertForSequenceClassification (QDQBert model)
  • qwen2Qwen2ForSequenceClassification (Qwen2 model)
  • qwen2_moeQwen2MoeForSequenceClassification (Qwen2MoE model)
  • qwen3Qwen3ForSequenceClassification (Qwen3 model)
  • qwen3_moeQwen3MoeForSequenceClassification (Qwen3MoE model)
  • reformerReformerForSequenceClassification (Reformer model)
  • rembertRemBertForSequenceClassification (RemBERT model)
  • robertaRobertaForSequenceClassification (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormForSequenceClassification (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertForSequenceClassification (RoCBert model)
  • roformerRoFormerForSequenceClassification (RoFormer model)
  • seed_ossSeedOssForSequenceClassification (SeedOss model)
  • smollm3SmolLM3ForSequenceClassification (SmolLM3 model)
  • squeezebertSqueezeBertForSequenceClassification (SqueezeBERT model)
  • stablelmStableLmForSequenceClassification (StableLm model)
  • starcoder2Starcoder2ForSequenceClassification (Starcoder2 model)
  • t5T5ForSequenceClassification (T5 model)
  • t5gemmaT5GemmaForSequenceClassification (T5Gemma model)
  • tapasTapasForSequenceClassification (TAPAS model)
  • transfo-xlTransfoXLForSequenceClassification (Transformer-XL model)
  • umt5UMT5ForSequenceClassification (UMT5 model)
  • xlmXLMForSequenceClassification (XLM model)
  • xlm-robertaXLMRobertaForSequenceClassification (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLForSequenceClassification (XLM-RoBERTa-XL model)
  • xlnetXLNetForSequenceClassification (XLNet model)
  • xmodXmodForSequenceClassification (X-MOD model)
  • yosoYosoForSequenceClassification (YOSO model)
  • zambaZambaForSequenceClassification (Zamba model)
  • zamba2Zamba2ForSequenceClassification (Zamba2 model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForSequenceClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForSequenceClassification.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForSequenceClassification

class transformers.TFAutoModelForSequenceClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: TFAlbertForSequenceClassification (ALBERT model)
    • BartConfig configuration class: TFBartForSequenceClassification (BART model)
    • BertConfig configuration class: TFBertForSequenceClassification (BERT model)
    • CTRLConfig configuration class: TFCTRLForSequenceClassification (CTRL model)
    • CamembertConfig configuration class: TFCamembertForSequenceClassification (CamemBERT model)
    • ConvBertConfig configuration class: TFConvBertForSequenceClassification (ConvBERT model)
    • DebertaConfig configuration class: TFDebertaForSequenceClassification (DeBERTa model)
    • DebertaV2Config configuration class: TFDebertaV2ForSequenceClassification (DeBERTa-v2 model)
    • DistilBertConfig configuration class: TFDistilBertForSequenceClassification (DistilBERT model)
    • ElectraConfig configuration class: TFElectraForSequenceClassification (ELECTRA model)
    • EsmConfig configuration class: TFEsmForSequenceClassification (ESM model)
    • FlaubertConfig configuration class: TFFlaubertForSequenceClassification (FlauBERT model)
    • FunnelConfig configuration class: TFFunnelForSequenceClassification (Funnel Transformer model)
    • GPT2Config configuration class: TFGPT2ForSequenceClassification (OpenAI GPT-2 model)
    • GPTJConfig configuration class: TFGPTJForSequenceClassification (GPT-J model)
    • LayoutLMConfig configuration class: TFLayoutLMForSequenceClassification (LayoutLM model)
    • LayoutLMv3Config configuration class: TFLayoutLMv3ForSequenceClassification (LayoutLMv3 model)
    • LongformerConfig configuration class: TFLongformerForSequenceClassification (Longformer model)
    • MPNetConfig configuration class: TFMPNetForSequenceClassification (MPNet model)
    • MistralConfig configuration class: TFMistralForSequenceClassification (Mistral model)
    • MobileBertConfig configuration class: TFMobileBertForSequenceClassification (MobileBERT model)
    • OpenAIGPTConfig configuration class: TFOpenAIGPTForSequenceClassification (OpenAI GPT model)
    • RemBertConfig configuration class: TFRemBertForSequenceClassification (RemBERT model)
    • RoFormerConfig configuration class: TFRoFormerForSequenceClassification (RoFormer model)
    • RobertaConfig configuration class: TFRobertaForSequenceClassification (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForSequenceClassification (RoBERTa-PreLayerNorm model)
    • TapasConfig configuration class: TFTapasForSequenceClassification (TAPAS model)
    • TransfoXLConfig configuration class: TFTransfoXLForSequenceClassification (Transformer-XL model)
    • XLMConfig configuration class: TFXLMForSequenceClassification (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaForSequenceClassification (XLM-RoBERTa model)
    • XLNetConfig configuration class: TFXLNetForSequenceClassification (XLNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSequenceClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForSequenceClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertTFAlbertForSequenceClassification (ALBERT model)
  • bartTFBartForSequenceClassification (BART model)
  • bertTFBertForSequenceClassification (BERT model)
  • camembertTFCamembertForSequenceClassification (CamemBERT model)
  • convbertTFConvBertForSequenceClassification (ConvBERT model)
  • ctrlTFCTRLForSequenceClassification (CTRL model)
  • debertaTFDebertaForSequenceClassification (DeBERTa model)
  • deberta-v2TFDebertaV2ForSequenceClassification (DeBERTa-v2 model)
  • distilbertTFDistilBertForSequenceClassification (DistilBERT model)
  • electraTFElectraForSequenceClassification (ELECTRA model)
  • esmTFEsmForSequenceClassification (ESM model)
  • flaubertTFFlaubertForSequenceClassification (FlauBERT model)
  • funnelTFFunnelForSequenceClassification (Funnel Transformer model)
  • gpt-sw3TFGPT2ForSequenceClassification (GPT-Sw3 model)
  • gpt2TFGPT2ForSequenceClassification (OpenAI GPT-2 model)
  • gptjTFGPTJForSequenceClassification (GPT-J model)
  • layoutlmTFLayoutLMForSequenceClassification (LayoutLM model)
  • layoutlmv3TFLayoutLMv3ForSequenceClassification (LayoutLMv3 model)
  • longformerTFLongformerForSequenceClassification (Longformer model)
  • mistralTFMistralForSequenceClassification (Mistral model)
  • mobilebertTFMobileBertForSequenceClassification (MobileBERT model)
  • mpnetTFMPNetForSequenceClassification (MPNet model)
  • openai-gptTFOpenAIGPTForSequenceClassification (OpenAI GPT model)
  • rembertTFRemBertForSequenceClassification (RemBERT model)
  • robertaTFRobertaForSequenceClassification (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormForSequenceClassification (RoBERTa-PreLayerNorm model)
  • roformerTFRoFormerForSequenceClassification (RoFormer model)
  • tapasTFTapasForSequenceClassification (TAPAS model)
  • transfo-xlTFTransfoXLForSequenceClassification (Transformer-XL model)
  • xlmTFXLMForSequenceClassification (XLM model)
  • xlm-robertaTFXLMRobertaForSequenceClassification (XLM-RoBERTa model)
  • xlnetTFXLNetForSequenceClassification (XLNet model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSequenceClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForSequenceClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForSequenceClassification

class transformers.FlaxAutoModelForSequenceClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: FlaxAlbertForSequenceClassification (ALBERT model)
    • BartConfig configuration class: FlaxBartForSequenceClassification (BART model)
    • BertConfig configuration class: FlaxBertForSequenceClassification (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdForSequenceClassification (BigBird model)
    • DistilBertConfig configuration class: FlaxDistilBertForSequenceClassification (DistilBERT model)
    • ElectraConfig configuration class: FlaxElectraForSequenceClassification (ELECTRA model)
    • MBartConfig configuration class: FlaxMBartForSequenceClassification (mBART model)
    • RoFormerConfig configuration class: FlaxRoFormerForSequenceClassification (RoFormer model)
    • RobertaConfig configuration class: FlaxRobertaForSequenceClassification (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForSequenceClassification (RoBERTa-PreLayerNorm model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaForSequenceClassification (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForSequenceClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForSequenceClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertFlaxAlbertForSequenceClassification (ALBERT model)
  • bartFlaxBartForSequenceClassification (BART model)
  • bertFlaxBertForSequenceClassification (BERT model)
  • big_birdFlaxBigBirdForSequenceClassification (BigBird model)
  • distilbertFlaxDistilBertForSequenceClassification (DistilBERT model)
  • electraFlaxElectraForSequenceClassification (ELECTRA model)
  • mbartFlaxMBartForSequenceClassification (mBART model)
  • robertaFlaxRobertaForSequenceClassification (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormForSequenceClassification (RoBERTa-PreLayerNorm model)
  • roformerFlaxRoFormerForSequenceClassification (RoFormer model)
  • xlm-robertaFlaxXLMRobertaForSequenceClassification (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForSequenceClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForSequenceClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForMultipleChoice

class transformers.AutoModelForMultipleChoice

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a multiple choice head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: AlbertForMultipleChoice (ALBERT model)
    • BertConfig configuration class: BertForMultipleChoice (BERT model)
    • BigBirdConfig configuration class: BigBirdForMultipleChoice (BigBird model)
    • CamembertConfig configuration class: CamembertForMultipleChoice (CamemBERT model)
    • CanineConfig configuration class: CanineForMultipleChoice (CANINE model)
    • ConvBertConfig configuration class: ConvBertForMultipleChoice (ConvBERT model)
    • Data2VecTextConfig configuration class: Data2VecTextForMultipleChoice (Data2VecText model)
    • DebertaV2Config configuration class: DebertaV2ForMultipleChoice (DeBERTa-v2 model)
    • DistilBertConfig configuration class: DistilBertForMultipleChoice (DistilBERT model)
    • ElectraConfig configuration class: ElectraForMultipleChoice (ELECTRA model)
    • ErnieConfig configuration class: ErnieForMultipleChoice (ERNIE model)
    • ErnieMConfig configuration class: ErnieMForMultipleChoice (ErnieM model)
    • FNetConfig configuration class: FNetForMultipleChoice (FNet model)
    • FlaubertConfig configuration class: FlaubertForMultipleChoice (FlauBERT model)
    • FunnelConfig configuration class: FunnelForMultipleChoice (Funnel Transformer model)
    • IBertConfig configuration class: IBertForMultipleChoice (I-BERT model)
    • LongformerConfig configuration class: LongformerForMultipleChoice (Longformer model)
    • LukeConfig configuration class: LukeForMultipleChoice (LUKE model)
    • MPNetConfig configuration class: MPNetForMultipleChoice (MPNet model)
    • MegaConfig configuration class: MegaForMultipleChoice (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertForMultipleChoice (Megatron-BERT model)
    • MobileBertConfig configuration class: MobileBertForMultipleChoice (MobileBERT model)
    • ModernBertConfig configuration class: ModernBertForMultipleChoice (ModernBERT model)
    • MraConfig configuration class: MraForMultipleChoice (MRA model)
    • NezhaConfig configuration class: NezhaForMultipleChoice (Nezha model)
    • NystromformerConfig configuration class: NystromformerForMultipleChoice (Nyströmformer model)
    • QDQBertConfig configuration class: QDQBertForMultipleChoice (QDQBert model)
    • RemBertConfig configuration class: RemBertForMultipleChoice (RemBERT model)
    • RoCBertConfig configuration class: RoCBertForMultipleChoice (RoCBert model)
    • RoFormerConfig configuration class: RoFormerForMultipleChoice (RoFormer model)
    • RobertaConfig configuration class: RobertaForMultipleChoice (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForMultipleChoice (RoBERTa-PreLayerNorm model)
    • SqueezeBertConfig configuration class: SqueezeBertForMultipleChoice (SqueezeBERT model)
    • XLMConfig configuration class: XLMForMultipleChoice (XLM model)
    • XLMRobertaConfig configuration class: XLMRobertaForMultipleChoice (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLForMultipleChoice (XLM-RoBERTa-XL model)
    • XLNetConfig configuration class: XLNetForMultipleChoice (XLNet model)
    • XmodConfig configuration class: XmodForMultipleChoice (X-MOD model)
    • YosoConfig configuration class: YosoForMultipleChoice (YOSO model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a multiple choice head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForMultipleChoice

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForMultipleChoice.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a multiple choice head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertAlbertForMultipleChoice (ALBERT model)
  • bertBertForMultipleChoice (BERT model)
  • big_birdBigBirdForMultipleChoice (BigBird model)
  • camembertCamembertForMultipleChoice (CamemBERT model)
  • canineCanineForMultipleChoice (CANINE model)
  • convbertConvBertForMultipleChoice (ConvBERT model)
  • data2vec-textData2VecTextForMultipleChoice (Data2VecText model)
  • deberta-v2DebertaV2ForMultipleChoice (DeBERTa-v2 model)
  • distilbertDistilBertForMultipleChoice (DistilBERT model)
  • electraElectraForMultipleChoice (ELECTRA model)
  • ernieErnieForMultipleChoice (ERNIE model)
  • ernie_mErnieMForMultipleChoice (ErnieM model)
  • flaubertFlaubertForMultipleChoice (FlauBERT model)
  • fnetFNetForMultipleChoice (FNet model)
  • funnelFunnelForMultipleChoice (Funnel Transformer model)
  • ibertIBertForMultipleChoice (I-BERT model)
  • longformerLongformerForMultipleChoice (Longformer model)
  • lukeLukeForMultipleChoice (LUKE model)
  • megaMegaForMultipleChoice (MEGA model)
  • megatron-bertMegatronBertForMultipleChoice (Megatron-BERT model)
  • mobilebertMobileBertForMultipleChoice (MobileBERT model)
  • modernbertModernBertForMultipleChoice (ModernBERT model)
  • mpnetMPNetForMultipleChoice (MPNet model)
  • mraMraForMultipleChoice (MRA model)
  • nezhaNezhaForMultipleChoice (Nezha model)
  • nystromformerNystromformerForMultipleChoice (Nyströmformer model)
  • qdqbertQDQBertForMultipleChoice (QDQBert model)
  • rembertRemBertForMultipleChoice (RemBERT model)
  • robertaRobertaForMultipleChoice (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormForMultipleChoice (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertForMultipleChoice (RoCBert model)
  • roformerRoFormerForMultipleChoice (RoFormer model)
  • squeezebertSqueezeBertForMultipleChoice (SqueezeBERT model)
  • xlmXLMForMultipleChoice (XLM model)
  • xlm-robertaXLMRobertaForMultipleChoice (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLForMultipleChoice (XLM-RoBERTa-XL model)
  • xlnetXLNetForMultipleChoice (XLNet model)
  • xmodXmodForMultipleChoice (X-MOD model)
  • yosoYosoForMultipleChoice (YOSO model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForMultipleChoice

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForMultipleChoice.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForMultipleChoice

class transformers.TFAutoModelForMultipleChoice

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a multiple choice head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: TFAlbertForMultipleChoice (ALBERT model)
    • BertConfig configuration class: TFBertForMultipleChoice (BERT model)
    • CamembertConfig configuration class: TFCamembertForMultipleChoice (CamemBERT model)
    • ConvBertConfig configuration class: TFConvBertForMultipleChoice (ConvBERT model)
    • DebertaV2Config configuration class: TFDebertaV2ForMultipleChoice (DeBERTa-v2 model)
    • DistilBertConfig configuration class: TFDistilBertForMultipleChoice (DistilBERT model)
    • ElectraConfig configuration class: TFElectraForMultipleChoice (ELECTRA model)
    • FlaubertConfig configuration class: TFFlaubertForMultipleChoice (FlauBERT model)
    • FunnelConfig configuration class: TFFunnelForMultipleChoice (Funnel Transformer model)
    • LongformerConfig configuration class: TFLongformerForMultipleChoice (Longformer model)
    • MPNetConfig configuration class: TFMPNetForMultipleChoice (MPNet model)
    • MobileBertConfig configuration class: TFMobileBertForMultipleChoice (MobileBERT model)
    • RemBertConfig configuration class: TFRemBertForMultipleChoice (RemBERT model)
    • RoFormerConfig configuration class: TFRoFormerForMultipleChoice (RoFormer model)
    • RobertaConfig configuration class: TFRobertaForMultipleChoice (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForMultipleChoice (RoBERTa-PreLayerNorm model)
    • XLMConfig configuration class: TFXLMForMultipleChoice (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaForMultipleChoice (XLM-RoBERTa model)
    • XLNetConfig configuration class: TFXLNetForMultipleChoice (XLNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a multiple choice head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForMultipleChoice

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForMultipleChoice.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a multiple choice head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertTFAlbertForMultipleChoice (ALBERT model)
  • bertTFBertForMultipleChoice (BERT model)
  • camembertTFCamembertForMultipleChoice (CamemBERT model)
  • convbertTFConvBertForMultipleChoice (ConvBERT model)
  • deberta-v2TFDebertaV2ForMultipleChoice (DeBERTa-v2 model)
  • distilbertTFDistilBertForMultipleChoice (DistilBERT model)
  • electraTFElectraForMultipleChoice (ELECTRA model)
  • flaubertTFFlaubertForMultipleChoice (FlauBERT model)
  • funnelTFFunnelForMultipleChoice (Funnel Transformer model)
  • longformerTFLongformerForMultipleChoice (Longformer model)
  • mobilebertTFMobileBertForMultipleChoice (MobileBERT model)
  • mpnetTFMPNetForMultipleChoice (MPNet model)
  • rembertTFRemBertForMultipleChoice (RemBERT model)
  • robertaTFRobertaForMultipleChoice (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormForMultipleChoice (RoBERTa-PreLayerNorm model)
  • roformerTFRoFormerForMultipleChoice (RoFormer model)
  • xlmTFXLMForMultipleChoice (XLM model)
  • xlm-robertaTFXLMRobertaForMultipleChoice (XLM-RoBERTa model)
  • xlnetTFXLNetForMultipleChoice (XLNet model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForMultipleChoice

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForMultipleChoice.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForMultipleChoice

class transformers.FlaxAutoModelForMultipleChoice

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a multiple choice head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: FlaxAlbertForMultipleChoice (ALBERT model)
    • BertConfig configuration class: FlaxBertForMultipleChoice (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdForMultipleChoice (BigBird model)
    • DistilBertConfig configuration class: FlaxDistilBertForMultipleChoice (DistilBERT model)
    • ElectraConfig configuration class: FlaxElectraForMultipleChoice (ELECTRA model)
    • RoFormerConfig configuration class: FlaxRoFormerForMultipleChoice (RoFormer model)
    • RobertaConfig configuration class: FlaxRobertaForMultipleChoice (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForMultipleChoice (RoBERTa-PreLayerNorm model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaForMultipleChoice (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a multiple choice head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForMultipleChoice

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForMultipleChoice.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a multiple choice head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertFlaxAlbertForMultipleChoice (ALBERT model)
  • bertFlaxBertForMultipleChoice (BERT model)
  • big_birdFlaxBigBirdForMultipleChoice (BigBird model)
  • distilbertFlaxDistilBertForMultipleChoice (DistilBERT model)
  • electraFlaxElectraForMultipleChoice (ELECTRA model)
  • robertaFlaxRobertaForMultipleChoice (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormForMultipleChoice (RoBERTa-PreLayerNorm model)
  • roformerFlaxRoFormerForMultipleChoice (RoFormer model)
  • xlm-robertaFlaxXLMRobertaForMultipleChoice (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForMultipleChoice

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForMultipleChoice.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForNextSentencePrediction

class transformers.AutoModelForNextSentencePrediction

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a next sentence prediction head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BertConfig configuration class: BertForNextSentencePrediction (BERT model)
    • ErnieConfig configuration class: ErnieForNextSentencePrediction (ERNIE model)
    • FNetConfig configuration class: FNetForNextSentencePrediction (FNet model)
    • MegatronBertConfig configuration class: MegatronBertForNextSentencePrediction (Megatron-BERT model)
    • MobileBertConfig configuration class: MobileBertForNextSentencePrediction (MobileBERT model)
    • NezhaConfig configuration class: NezhaForNextSentencePrediction (Nezha model)
    • QDQBertConfig configuration class: QDQBertForNextSentencePrediction (QDQBert model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a next sentence prediction head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForNextSentencePrediction

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForNextSentencePrediction.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a next sentence prediction head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • bertBertForNextSentencePrediction (BERT model)
  • ernieErnieForNextSentencePrediction (ERNIE model)
  • fnetFNetForNextSentencePrediction (FNet model)
  • megatron-bertMegatronBertForNextSentencePrediction (Megatron-BERT model)
  • mobilebertMobileBertForNextSentencePrediction (MobileBERT model)
  • nezhaNezhaForNextSentencePrediction (Nezha model)
  • qdqbertQDQBertForNextSentencePrediction (QDQBert model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForNextSentencePrediction

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForNextSentencePrediction.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForNextSentencePrediction

class transformers.TFAutoModelForNextSentencePrediction

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a next sentence prediction head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a next sentence prediction head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForNextSentencePrediction

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForNextSentencePrediction.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a next sentence prediction head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

Examples:

>>> from transformers import AutoConfig, TFAutoModelForNextSentencePrediction

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForNextSentencePrediction.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForNextSentencePrediction

class transformers.FlaxAutoModelForNextSentencePrediction

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a next sentence prediction head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a next sentence prediction head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForNextSentencePrediction

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForNextSentencePrediction.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a next sentence prediction head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForNextSentencePrediction

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForNextSentencePrediction.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForTokenClassification

class transformers.AutoModelForTokenClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a token classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: AlbertForTokenClassification (ALBERT model)
    • ApertusConfig configuration class: ApertusForTokenClassification (Apertus model)
    • ArceeConfig configuration class: ArceeForTokenClassification (Arcee model)
    • BertConfig configuration class: BertForTokenClassification (BERT model)
    • BigBirdConfig configuration class: BigBirdForTokenClassification (BigBird model)
    • BioGptConfig configuration class: BioGptForTokenClassification (BioGpt model)
    • BloomConfig configuration class: BloomForTokenClassification (BLOOM model)
    • BrosConfig configuration class: BrosForTokenClassification (BROS model)
    • CamembertConfig configuration class: CamembertForTokenClassification (CamemBERT model)
    • CanineConfig configuration class: CanineForTokenClassification (CANINE model)
    • ConvBertConfig configuration class: ConvBertForTokenClassification (ConvBERT model)
    • Data2VecTextConfig configuration class: Data2VecTextForTokenClassification (Data2VecText model)
    • DebertaConfig configuration class: DebertaForTokenClassification (DeBERTa model)
    • DebertaV2Config configuration class: DebertaV2ForTokenClassification (DeBERTa-v2 model)
    • DiffLlamaConfig configuration class: DiffLlamaForTokenClassification (DiffLlama model)
    • DistilBertConfig configuration class: DistilBertForTokenClassification (DistilBERT model)
    • ElectraConfig configuration class: ElectraForTokenClassification (ELECTRA model)
    • ErnieConfig configuration class: ErnieForTokenClassification (ERNIE model)
    • ErnieMConfig configuration class: ErnieMForTokenClassification (ErnieM model)
    • EsmConfig configuration class: EsmForTokenClassification (ESM model)
    • Exaone4Config configuration class: Exaone4ForTokenClassification (EXAONE-4.0 model)
    • FNetConfig configuration class: FNetForTokenClassification (FNet model)
    • FalconConfig configuration class: FalconForTokenClassification (Falcon model)
    • FlaubertConfig configuration class: FlaubertForTokenClassification (FlauBERT model)
    • FunnelConfig configuration class: FunnelForTokenClassification (Funnel Transformer model)
    • GPT2Config configuration class: GPT2ForTokenClassification (OpenAI GPT-2 model)
    • GPTBigCodeConfig configuration class: GPTBigCodeForTokenClassification (GPTBigCode model)
    • GPTNeoConfig configuration class: GPTNeoForTokenClassification (GPT Neo model)
    • GPTNeoXConfig configuration class: GPTNeoXForTokenClassification (GPT NeoX model)
    • Gemma2Config configuration class: Gemma2ForTokenClassification (Gemma2 model)
    • GemmaConfig configuration class: GemmaForTokenClassification (Gemma model)
    • Glm4Config configuration class: Glm4ForTokenClassification (GLM4 model)
    • GlmConfig configuration class: GlmForTokenClassification (GLM model)
    • GptOssConfig configuration class: GptOssForTokenClassification (GptOss model)
    • HeliumConfig configuration class: HeliumForTokenClassification (Helium model)
    • IBertConfig configuration class: IBertForTokenClassification (I-BERT model)
    • LayoutLMConfig configuration class: LayoutLMForTokenClassification (LayoutLM model)
    • LayoutLMv2Config configuration class: LayoutLMv2ForTokenClassification (LayoutLMv2 model)
    • LayoutLMv3Config configuration class: LayoutLMv3ForTokenClassification (LayoutLMv3 model)
    • LiltConfig configuration class: LiltForTokenClassification (LiLT model)
    • LlamaConfig configuration class: LlamaForTokenClassification (LLaMA model)
    • LongformerConfig configuration class: LongformerForTokenClassification (Longformer model)
    • LukeConfig configuration class: LukeForTokenClassification (LUKE model)
    • MPNetConfig configuration class: MPNetForTokenClassification (MPNet model)
    • MT5Config configuration class: MT5ForTokenClassification (MT5 model)
    • MarkupLMConfig configuration class: MarkupLMForTokenClassification (MarkupLM model)
    • MegaConfig configuration class: MegaForTokenClassification (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertForTokenClassification (Megatron-BERT model)
    • MiniMaxConfig configuration class: MiniMaxForTokenClassification (MiniMax model)
    • MistralConfig configuration class: MistralForTokenClassification (Mistral model)
    • MixtralConfig configuration class: MixtralForTokenClassification (Mixtral model)
    • MobileBertConfig configuration class: MobileBertForTokenClassification (MobileBERT model)
    • ModernBertConfig configuration class: ModernBertForTokenClassification (ModernBERT model)
    • MptConfig configuration class: MptForTokenClassification (MPT model)
    • MraConfig configuration class: MraForTokenClassification (MRA model)
    • NemotronConfig configuration class: NemotronForTokenClassification (Nemotron model)
    • NezhaConfig configuration class: NezhaForTokenClassification (Nezha model)
    • NystromformerConfig configuration class: NystromformerForTokenClassification (Nyströmformer model)
    • PersimmonConfig configuration class: PersimmonForTokenClassification (Persimmon model)
    • Phi3Config configuration class: Phi3ForTokenClassification (Phi3 model)
    • PhiConfig configuration class: PhiForTokenClassification (Phi model)
    • QDQBertConfig configuration class: QDQBertForTokenClassification (QDQBert model)
    • Qwen2Config configuration class: Qwen2ForTokenClassification (Qwen2 model)
    • Qwen2MoeConfig configuration class: Qwen2MoeForTokenClassification (Qwen2MoE model)
    • Qwen3Config configuration class: Qwen3ForTokenClassification (Qwen3 model)
    • Qwen3MoeConfig configuration class: Qwen3MoeForTokenClassification (Qwen3MoE model)
    • RemBertConfig configuration class: RemBertForTokenClassification (RemBERT model)
    • RoCBertConfig configuration class: RoCBertForTokenClassification (RoCBert model)
    • RoFormerConfig configuration class: RoFormerForTokenClassification (RoFormer model)
    • RobertaConfig configuration class: RobertaForTokenClassification (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForTokenClassification (RoBERTa-PreLayerNorm model)
    • SeedOssConfig configuration class: SeedOssForTokenClassification (SeedOss model)
    • SmolLM3Config configuration class: SmolLM3ForTokenClassification (SmolLM3 model)
    • SqueezeBertConfig configuration class: SqueezeBertForTokenClassification (SqueezeBERT model)
    • StableLmConfig configuration class: StableLmForTokenClassification (StableLm model)
    • Starcoder2Config configuration class: Starcoder2ForTokenClassification (Starcoder2 model)
    • T5Config configuration class: T5ForTokenClassification (T5 model)
    • T5GemmaConfig configuration class: T5GemmaForTokenClassification (T5Gemma model)
    • UMT5Config configuration class: UMT5ForTokenClassification (UMT5 model)
    • XLMConfig configuration class: XLMForTokenClassification (XLM model)
    • XLMRobertaConfig configuration class: XLMRobertaForTokenClassification (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLForTokenClassification (XLM-RoBERTa-XL model)
    • XLNetConfig configuration class: XLNetForTokenClassification (XLNet model)
    • XmodConfig configuration class: XmodForTokenClassification (X-MOD model)
    • YosoConfig configuration class: YosoForTokenClassification (YOSO model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a token classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForTokenClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForTokenClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a token classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertAlbertForTokenClassification (ALBERT model)
  • apertusApertusForTokenClassification (Apertus model)
  • arceeArceeForTokenClassification (Arcee model)
  • bertBertForTokenClassification (BERT model)
  • big_birdBigBirdForTokenClassification (BigBird model)
  • biogptBioGptForTokenClassification (BioGpt model)
  • bloomBloomForTokenClassification (BLOOM model)
  • brosBrosForTokenClassification (BROS model)
  • camembertCamembertForTokenClassification (CamemBERT model)
  • canineCanineForTokenClassification (CANINE model)
  • convbertConvBertForTokenClassification (ConvBERT model)
  • data2vec-textData2VecTextForTokenClassification (Data2VecText model)
  • debertaDebertaForTokenClassification (DeBERTa model)
  • deberta-v2DebertaV2ForTokenClassification (DeBERTa-v2 model)
  • diffllamaDiffLlamaForTokenClassification (DiffLlama model)
  • distilbertDistilBertForTokenClassification (DistilBERT model)
  • electraElectraForTokenClassification (ELECTRA model)
  • ernieErnieForTokenClassification (ERNIE model)
  • ernie_mErnieMForTokenClassification (ErnieM model)
  • esmEsmForTokenClassification (ESM model)
  • exaone4Exaone4ForTokenClassification (EXAONE-4.0 model)
  • falconFalconForTokenClassification (Falcon model)
  • flaubertFlaubertForTokenClassification (FlauBERT model)
  • fnetFNetForTokenClassification (FNet model)
  • funnelFunnelForTokenClassification (Funnel Transformer model)
  • gemmaGemmaForTokenClassification (Gemma model)
  • gemma2Gemma2ForTokenClassification (Gemma2 model)
  • glmGlmForTokenClassification (GLM model)
  • glm4Glm4ForTokenClassification (GLM4 model)
  • gpt-sw3GPT2ForTokenClassification (GPT-Sw3 model)
  • gpt2GPT2ForTokenClassification (OpenAI GPT-2 model)
  • gpt_bigcodeGPTBigCodeForTokenClassification (GPTBigCode model)
  • gpt_neoGPTNeoForTokenClassification (GPT Neo model)
  • gpt_neoxGPTNeoXForTokenClassification (GPT NeoX model)
  • gpt_ossGptOssForTokenClassification (GptOss model)
  • heliumHeliumForTokenClassification (Helium model)
  • ibertIBertForTokenClassification (I-BERT model)
  • layoutlmLayoutLMForTokenClassification (LayoutLM model)
  • layoutlmv2LayoutLMv2ForTokenClassification (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3ForTokenClassification (LayoutLMv3 model)
  • liltLiltForTokenClassification (LiLT model)
  • llamaLlamaForTokenClassification (LLaMA model)
  • longformerLongformerForTokenClassification (Longformer model)
  • lukeLukeForTokenClassification (LUKE model)
  • markuplmMarkupLMForTokenClassification (MarkupLM model)
  • megaMegaForTokenClassification (MEGA model)
  • megatron-bertMegatronBertForTokenClassification (Megatron-BERT model)
  • minimaxMiniMaxForTokenClassification (MiniMax model)
  • mistralMistralForTokenClassification (Mistral model)
  • mixtralMixtralForTokenClassification (Mixtral model)
  • mobilebertMobileBertForTokenClassification (MobileBERT model)
  • modernbertModernBertForTokenClassification (ModernBERT model)
  • mpnetMPNetForTokenClassification (MPNet model)
  • mptMptForTokenClassification (MPT model)
  • mraMraForTokenClassification (MRA model)
  • mt5MT5ForTokenClassification (MT5 model)
  • nemotronNemotronForTokenClassification (Nemotron model)
  • nezhaNezhaForTokenClassification (Nezha model)
  • nystromformerNystromformerForTokenClassification (Nyströmformer model)
  • persimmonPersimmonForTokenClassification (Persimmon model)
  • phiPhiForTokenClassification (Phi model)
  • phi3Phi3ForTokenClassification (Phi3 model)
  • qdqbertQDQBertForTokenClassification (QDQBert model)
  • qwen2Qwen2ForTokenClassification (Qwen2 model)
  • qwen2_moeQwen2MoeForTokenClassification (Qwen2MoE model)
  • qwen3Qwen3ForTokenClassification (Qwen3 model)
  • qwen3_moeQwen3MoeForTokenClassification (Qwen3MoE model)
  • rembertRemBertForTokenClassification (RemBERT model)
  • robertaRobertaForTokenClassification (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormForTokenClassification (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertForTokenClassification (RoCBert model)
  • roformerRoFormerForTokenClassification (RoFormer model)
  • seed_ossSeedOssForTokenClassification (SeedOss model)
  • smollm3SmolLM3ForTokenClassification (SmolLM3 model)
  • squeezebertSqueezeBertForTokenClassification (SqueezeBERT model)
  • stablelmStableLmForTokenClassification (StableLm model)
  • starcoder2Starcoder2ForTokenClassification (Starcoder2 model)
  • t5T5ForTokenClassification (T5 model)
  • t5gemmaT5GemmaForTokenClassification (T5Gemma model)
  • umt5UMT5ForTokenClassification (UMT5 model)
  • xlmXLMForTokenClassification (XLM model)
  • xlm-robertaXLMRobertaForTokenClassification (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLForTokenClassification (XLM-RoBERTa-XL model)
  • xlnetXLNetForTokenClassification (XLNet model)
  • xmodXmodForTokenClassification (X-MOD model)
  • yosoYosoForTokenClassification (YOSO model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForTokenClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForTokenClassification.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForTokenClassification

class transformers.TFAutoModelForTokenClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a token classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: TFAlbertForTokenClassification (ALBERT model)
    • BertConfig configuration class: TFBertForTokenClassification (BERT model)
    • CamembertConfig configuration class: TFCamembertForTokenClassification (CamemBERT model)
    • ConvBertConfig configuration class: TFConvBertForTokenClassification (ConvBERT model)
    • DebertaConfig configuration class: TFDebertaForTokenClassification (DeBERTa model)
    • DebertaV2Config configuration class: TFDebertaV2ForTokenClassification (DeBERTa-v2 model)
    • DistilBertConfig configuration class: TFDistilBertForTokenClassification (DistilBERT model)
    • ElectraConfig configuration class: TFElectraForTokenClassification (ELECTRA model)
    • EsmConfig configuration class: TFEsmForTokenClassification (ESM model)
    • FlaubertConfig configuration class: TFFlaubertForTokenClassification (FlauBERT model)
    • FunnelConfig configuration class: TFFunnelForTokenClassification (Funnel Transformer model)
    • LayoutLMConfig configuration class: TFLayoutLMForTokenClassification (LayoutLM model)
    • LayoutLMv3Config configuration class: TFLayoutLMv3ForTokenClassification (LayoutLMv3 model)
    • LongformerConfig configuration class: TFLongformerForTokenClassification (Longformer model)
    • MPNetConfig configuration class: TFMPNetForTokenClassification (MPNet model)
    • MobileBertConfig configuration class: TFMobileBertForTokenClassification (MobileBERT model)
    • RemBertConfig configuration class: TFRemBertForTokenClassification (RemBERT model)
    • RoFormerConfig configuration class: TFRoFormerForTokenClassification (RoFormer model)
    • RobertaConfig configuration class: TFRobertaForTokenClassification (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForTokenClassification (RoBERTa-PreLayerNorm model)
    • XLMConfig configuration class: TFXLMForTokenClassification (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaForTokenClassification (XLM-RoBERTa model)
    • XLNetConfig configuration class: TFXLNetForTokenClassification (XLNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a token classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForTokenClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForTokenClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a token classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertTFAlbertForTokenClassification (ALBERT model)
  • bertTFBertForTokenClassification (BERT model)
  • camembertTFCamembertForTokenClassification (CamemBERT model)
  • convbertTFConvBertForTokenClassification (ConvBERT model)
  • debertaTFDebertaForTokenClassification (DeBERTa model)
  • deberta-v2TFDebertaV2ForTokenClassification (DeBERTa-v2 model)
  • distilbertTFDistilBertForTokenClassification (DistilBERT model)
  • electraTFElectraForTokenClassification (ELECTRA model)
  • esmTFEsmForTokenClassification (ESM model)
  • flaubertTFFlaubertForTokenClassification (FlauBERT model)
  • funnelTFFunnelForTokenClassification (Funnel Transformer model)
  • layoutlmTFLayoutLMForTokenClassification (LayoutLM model)
  • layoutlmv3TFLayoutLMv3ForTokenClassification (LayoutLMv3 model)
  • longformerTFLongformerForTokenClassification (Longformer model)
  • mobilebertTFMobileBertForTokenClassification (MobileBERT model)
  • mpnetTFMPNetForTokenClassification (MPNet model)
  • rembertTFRemBertForTokenClassification (RemBERT model)
  • robertaTFRobertaForTokenClassification (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormForTokenClassification (RoBERTa-PreLayerNorm model)
  • roformerTFRoFormerForTokenClassification (RoFormer model)
  • xlmTFXLMForTokenClassification (XLM model)
  • xlm-robertaTFXLMRobertaForTokenClassification (XLM-RoBERTa model)
  • xlnetTFXLNetForTokenClassification (XLNet model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForTokenClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForTokenClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForTokenClassification

class transformers.FlaxAutoModelForTokenClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a token classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: FlaxAlbertForTokenClassification (ALBERT model)
    • BertConfig configuration class: FlaxBertForTokenClassification (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdForTokenClassification (BigBird model)
    • DistilBertConfig configuration class: FlaxDistilBertForTokenClassification (DistilBERT model)
    • ElectraConfig configuration class: FlaxElectraForTokenClassification (ELECTRA model)
    • RoFormerConfig configuration class: FlaxRoFormerForTokenClassification (RoFormer model)
    • RobertaConfig configuration class: FlaxRobertaForTokenClassification (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForTokenClassification (RoBERTa-PreLayerNorm model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaForTokenClassification (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a token classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForTokenClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForTokenClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a token classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertFlaxAlbertForTokenClassification (ALBERT model)
  • bertFlaxBertForTokenClassification (BERT model)
  • big_birdFlaxBigBirdForTokenClassification (BigBird model)
  • distilbertFlaxDistilBertForTokenClassification (DistilBERT model)
  • electraFlaxElectraForTokenClassification (ELECTRA model)
  • robertaFlaxRobertaForTokenClassification (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormForTokenClassification (RoBERTa-PreLayerNorm model)
  • roformerFlaxRoFormerForTokenClassification (RoFormer model)
  • xlm-robertaFlaxXLMRobertaForTokenClassification (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForTokenClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForTokenClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForQuestionAnswering

class transformers.AutoModelForQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: AlbertForQuestionAnswering (ALBERT model)
    • ArceeConfig configuration class: ArceeForQuestionAnswering (Arcee model)
    • BartConfig configuration class: BartForQuestionAnswering (BART model)
    • BertConfig configuration class: BertForQuestionAnswering (BERT model)
    • BigBirdConfig configuration class: BigBirdForQuestionAnswering (BigBird model)
    • BigBirdPegasusConfig configuration class: BigBirdPegasusForQuestionAnswering (BigBird-Pegasus model)
    • BloomConfig configuration class: BloomForQuestionAnswering (BLOOM model)
    • CamembertConfig configuration class: CamembertForQuestionAnswering (CamemBERT model)
    • CanineConfig configuration class: CanineForQuestionAnswering (CANINE model)
    • ConvBertConfig configuration class: ConvBertForQuestionAnswering (ConvBERT model)
    • Data2VecTextConfig configuration class: Data2VecTextForQuestionAnswering (Data2VecText model)
    • DebertaConfig configuration class: DebertaForQuestionAnswering (DeBERTa model)
    • DebertaV2Config configuration class: DebertaV2ForQuestionAnswering (DeBERTa-v2 model)
    • DiffLlamaConfig configuration class: DiffLlamaForQuestionAnswering (DiffLlama model)
    • DistilBertConfig configuration class: DistilBertForQuestionAnswering (DistilBERT model)
    • ElectraConfig configuration class: ElectraForQuestionAnswering (ELECTRA model)
    • ErnieConfig configuration class: ErnieForQuestionAnswering (ERNIE model)
    • ErnieMConfig configuration class: ErnieMForQuestionAnswering (ErnieM model)
    • Exaone4Config configuration class: Exaone4ForQuestionAnswering (EXAONE-4.0 model)
    • FNetConfig configuration class: FNetForQuestionAnswering (FNet model)
    • FalconConfig configuration class: FalconForQuestionAnswering (Falcon model)
    • FlaubertConfig configuration class: FlaubertForQuestionAnsweringSimple (FlauBERT model)
    • FunnelConfig configuration class: FunnelForQuestionAnswering (Funnel Transformer model)
    • GPT2Config configuration class: GPT2ForQuestionAnswering (OpenAI GPT-2 model)
    • GPTJConfig configuration class: GPTJForQuestionAnswering (GPT-J model)
    • GPTNeoConfig configuration class: GPTNeoForQuestionAnswering (GPT Neo model)
    • GPTNeoXConfig configuration class: GPTNeoXForQuestionAnswering (GPT NeoX model)
    • IBertConfig configuration class: IBertForQuestionAnswering (I-BERT model)
    • LEDConfig configuration class: LEDForQuestionAnswering (LED model)
    • LayoutLMv2Config configuration class: LayoutLMv2ForQuestionAnswering (LayoutLMv2 model)
    • LayoutLMv3Config configuration class: LayoutLMv3ForQuestionAnswering (LayoutLMv3 model)
    • LiltConfig configuration class: LiltForQuestionAnswering (LiLT model)
    • LlamaConfig configuration class: LlamaForQuestionAnswering (LLaMA model)
    • LongformerConfig configuration class: LongformerForQuestionAnswering (Longformer model)
    • LukeConfig configuration class: LukeForQuestionAnswering (LUKE model)
    • LxmertConfig configuration class: LxmertForQuestionAnswering (LXMERT model)
    • MBartConfig configuration class: MBartForQuestionAnswering (mBART model)
    • MPNetConfig configuration class: MPNetForQuestionAnswering (MPNet model)
    • MT5Config configuration class: MT5ForQuestionAnswering (MT5 model)
    • MarkupLMConfig configuration class: MarkupLMForQuestionAnswering (MarkupLM model)
    • MegaConfig configuration class: MegaForQuestionAnswering (MEGA model)
    • MegatronBertConfig configuration class: MegatronBertForQuestionAnswering (Megatron-BERT model)
    • MiniMaxConfig configuration class: MiniMaxForQuestionAnswering (MiniMax model)
    • MistralConfig configuration class: MistralForQuestionAnswering (Mistral model)
    • MixtralConfig configuration class: MixtralForQuestionAnswering (Mixtral model)
    • MobileBertConfig configuration class: MobileBertForQuestionAnswering (MobileBERT model)
    • ModernBertConfig configuration class: ModernBertForQuestionAnswering (ModernBERT model)
    • MptConfig configuration class: MptForQuestionAnswering (MPT model)
    • MraConfig configuration class: MraForQuestionAnswering (MRA model)
    • MvpConfig configuration class: MvpForQuestionAnswering (MVP model)
    • NemotronConfig configuration class: NemotronForQuestionAnswering (Nemotron model)
    • NezhaConfig configuration class: NezhaForQuestionAnswering (Nezha model)
    • NystromformerConfig configuration class: NystromformerForQuestionAnswering (Nyströmformer model)
    • OPTConfig configuration class: OPTForQuestionAnswering (OPT model)
    • QDQBertConfig configuration class: QDQBertForQuestionAnswering (QDQBert model)
    • Qwen2Config configuration class: Qwen2ForQuestionAnswering (Qwen2 model)
    • Qwen2MoeConfig configuration class: Qwen2MoeForQuestionAnswering (Qwen2MoE model)
    • Qwen3Config configuration class: Qwen3ForQuestionAnswering (Qwen3 model)
    • Qwen3MoeConfig configuration class: Qwen3MoeForQuestionAnswering (Qwen3MoE model)
    • ReformerConfig configuration class: ReformerForQuestionAnswering (Reformer model)
    • RemBertConfig configuration class: RemBertForQuestionAnswering (RemBERT model)
    • RoCBertConfig configuration class: RoCBertForQuestionAnswering (RoCBert model)
    • RoFormerConfig configuration class: RoFormerForQuestionAnswering (RoFormer model)
    • RobertaConfig configuration class: RobertaForQuestionAnswering (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForQuestionAnswering (RoBERTa-PreLayerNorm model)
    • SeedOssConfig configuration class: SeedOssForQuestionAnswering (SeedOss model)
    • SmolLM3Config configuration class: SmolLM3ForQuestionAnswering (SmolLM3 model)
    • SplinterConfig configuration class: SplinterForQuestionAnswering (Splinter model)
    • SqueezeBertConfig configuration class: SqueezeBertForQuestionAnswering (SqueezeBERT model)
    • T5Config configuration class: T5ForQuestionAnswering (T5 model)
    • UMT5Config configuration class: UMT5ForQuestionAnswering (UMT5 model)
    • XLMConfig configuration class: XLMForQuestionAnsweringSimple (XLM model)
    • XLMRobertaConfig configuration class: XLMRobertaForQuestionAnswering (XLM-RoBERTa model)
    • XLMRobertaXLConfig configuration class: XLMRobertaXLForQuestionAnswering (XLM-RoBERTa-XL model)
    • XLNetConfig configuration class: XLNetForQuestionAnsweringSimple (XLNet model)
    • XmodConfig configuration class: XmodForQuestionAnswering (X-MOD model)
    • YosoConfig configuration class: YosoForQuestionAnswering (YOSO model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertAlbertForQuestionAnswering (ALBERT model)
  • arceeArceeForQuestionAnswering (Arcee model)
  • bartBartForQuestionAnswering (BART model)
  • bertBertForQuestionAnswering (BERT model)
  • big_birdBigBirdForQuestionAnswering (BigBird model)
  • bigbird_pegasusBigBirdPegasusForQuestionAnswering (BigBird-Pegasus model)
  • bloomBloomForQuestionAnswering (BLOOM model)
  • camembertCamembertForQuestionAnswering (CamemBERT model)
  • canineCanineForQuestionAnswering (CANINE model)
  • convbertConvBertForQuestionAnswering (ConvBERT model)
  • data2vec-textData2VecTextForQuestionAnswering (Data2VecText model)
  • debertaDebertaForQuestionAnswering (DeBERTa model)
  • deberta-v2DebertaV2ForQuestionAnswering (DeBERTa-v2 model)
  • diffllamaDiffLlamaForQuestionAnswering (DiffLlama model)
  • distilbertDistilBertForQuestionAnswering (DistilBERT model)
  • electraElectraForQuestionAnswering (ELECTRA model)
  • ernieErnieForQuestionAnswering (ERNIE model)
  • ernie_mErnieMForQuestionAnswering (ErnieM model)
  • exaone4Exaone4ForQuestionAnswering (EXAONE-4.0 model)
  • falconFalconForQuestionAnswering (Falcon model)
  • flaubertFlaubertForQuestionAnsweringSimple (FlauBERT model)
  • fnetFNetForQuestionAnswering (FNet model)
  • funnelFunnelForQuestionAnswering (Funnel Transformer model)
  • gpt2GPT2ForQuestionAnswering (OpenAI GPT-2 model)
  • gpt_neoGPTNeoForQuestionAnswering (GPT Neo model)
  • gpt_neoxGPTNeoXForQuestionAnswering (GPT NeoX model)
  • gptjGPTJForQuestionAnswering (GPT-J model)
  • ibertIBertForQuestionAnswering (I-BERT model)
  • layoutlmv2LayoutLMv2ForQuestionAnswering (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3ForQuestionAnswering (LayoutLMv3 model)
  • ledLEDForQuestionAnswering (LED model)
  • liltLiltForQuestionAnswering (LiLT model)
  • llamaLlamaForQuestionAnswering (LLaMA model)
  • longformerLongformerForQuestionAnswering (Longformer model)
  • lukeLukeForQuestionAnswering (LUKE model)
  • lxmertLxmertForQuestionAnswering (LXMERT model)
  • markuplmMarkupLMForQuestionAnswering (MarkupLM model)
  • mbartMBartForQuestionAnswering (mBART model)
  • megaMegaForQuestionAnswering (MEGA model)
  • megatron-bertMegatronBertForQuestionAnswering (Megatron-BERT model)
  • minimaxMiniMaxForQuestionAnswering (MiniMax model)
  • mistralMistralForQuestionAnswering (Mistral model)
  • mixtralMixtralForQuestionAnswering (Mixtral model)
  • mobilebertMobileBertForQuestionAnswering (MobileBERT model)
  • modernbertModernBertForQuestionAnswering (ModernBERT model)
  • mpnetMPNetForQuestionAnswering (MPNet model)
  • mptMptForQuestionAnswering (MPT model)
  • mraMraForQuestionAnswering (MRA model)
  • mt5MT5ForQuestionAnswering (MT5 model)
  • mvpMvpForQuestionAnswering (MVP model)
  • nemotronNemotronForQuestionAnswering (Nemotron model)
  • nezhaNezhaForQuestionAnswering (Nezha model)
  • nystromformerNystromformerForQuestionAnswering (Nyströmformer model)
  • optOPTForQuestionAnswering (OPT model)
  • qdqbertQDQBertForQuestionAnswering (QDQBert model)
  • qwen2Qwen2ForQuestionAnswering (Qwen2 model)
  • qwen2_moeQwen2MoeForQuestionAnswering (Qwen2MoE model)
  • qwen3Qwen3ForQuestionAnswering (Qwen3 model)
  • qwen3_moeQwen3MoeForQuestionAnswering (Qwen3MoE model)
  • reformerReformerForQuestionAnswering (Reformer model)
  • rembertRemBertForQuestionAnswering (RemBERT model)
  • robertaRobertaForQuestionAnswering (RoBERTa model)
  • roberta-prelayernormRobertaPreLayerNormForQuestionAnswering (RoBERTa-PreLayerNorm model)
  • roc_bertRoCBertForQuestionAnswering (RoCBert model)
  • roformerRoFormerForQuestionAnswering (RoFormer model)
  • seed_ossSeedOssForQuestionAnswering (SeedOss model)
  • smollm3SmolLM3ForQuestionAnswering (SmolLM3 model)
  • splinterSplinterForQuestionAnswering (Splinter model)
  • squeezebertSqueezeBertForQuestionAnswering (SqueezeBERT model)
  • t5T5ForQuestionAnswering (T5 model)
  • umt5UMT5ForQuestionAnswering (UMT5 model)
  • xlmXLMForQuestionAnsweringSimple (XLM model)
  • xlm-robertaXLMRobertaForQuestionAnswering (XLM-RoBERTa model)
  • xlm-roberta-xlXLMRobertaXLForQuestionAnswering (XLM-RoBERTa-XL model)
  • xlnetXLNetForQuestionAnsweringSimple (XLNet model)
  • xmodXmodForQuestionAnswering (X-MOD model)
  • yosoYosoForQuestionAnswering (YOSO model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForQuestionAnswering.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForQuestionAnswering

class transformers.TFAutoModelForQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: TFAlbertForQuestionAnswering (ALBERT model)
    • BertConfig configuration class: TFBertForQuestionAnswering (BERT model)
    • CamembertConfig configuration class: TFCamembertForQuestionAnswering (CamemBERT model)
    • ConvBertConfig configuration class: TFConvBertForQuestionAnswering (ConvBERT model)
    • DebertaConfig configuration class: TFDebertaForQuestionAnswering (DeBERTa model)
    • DebertaV2Config configuration class: TFDebertaV2ForQuestionAnswering (DeBERTa-v2 model)
    • DistilBertConfig configuration class: TFDistilBertForQuestionAnswering (DistilBERT model)
    • ElectraConfig configuration class: TFElectraForQuestionAnswering (ELECTRA model)
    • FlaubertConfig configuration class: TFFlaubertForQuestionAnsweringSimple (FlauBERT model)
    • FunnelConfig configuration class: TFFunnelForQuestionAnswering (Funnel Transformer model)
    • GPTJConfig configuration class: TFGPTJForQuestionAnswering (GPT-J model)
    • LayoutLMv3Config configuration class: TFLayoutLMv3ForQuestionAnswering (LayoutLMv3 model)
    • LongformerConfig configuration class: TFLongformerForQuestionAnswering (Longformer model)
    • MPNetConfig configuration class: TFMPNetForQuestionAnswering (MPNet model)
    • MobileBertConfig configuration class: TFMobileBertForQuestionAnswering (MobileBERT model)
    • RemBertConfig configuration class: TFRemBertForQuestionAnswering (RemBERT model)
    • RoFormerConfig configuration class: TFRoFormerForQuestionAnswering (RoFormer model)
    • RobertaConfig configuration class: TFRobertaForQuestionAnswering (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForQuestionAnswering (RoBERTa-PreLayerNorm model)
    • XLMConfig configuration class: TFXLMForQuestionAnsweringSimple (XLM model)
    • XLMRobertaConfig configuration class: TFXLMRobertaForQuestionAnswering (XLM-RoBERTa model)
    • XLNetConfig configuration class: TFXLNetForQuestionAnsweringSimple (XLNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertTFAlbertForQuestionAnswering (ALBERT model)
  • bertTFBertForQuestionAnswering (BERT model)
  • camembertTFCamembertForQuestionAnswering (CamemBERT model)
  • convbertTFConvBertForQuestionAnswering (ConvBERT model)
  • debertaTFDebertaForQuestionAnswering (DeBERTa model)
  • deberta-v2TFDebertaV2ForQuestionAnswering (DeBERTa-v2 model)
  • distilbertTFDistilBertForQuestionAnswering (DistilBERT model)
  • electraTFElectraForQuestionAnswering (ELECTRA model)
  • flaubertTFFlaubertForQuestionAnsweringSimple (FlauBERT model)
  • funnelTFFunnelForQuestionAnswering (Funnel Transformer model)
  • gptjTFGPTJForQuestionAnswering (GPT-J model)
  • layoutlmv3TFLayoutLMv3ForQuestionAnswering (LayoutLMv3 model)
  • longformerTFLongformerForQuestionAnswering (Longformer model)
  • mobilebertTFMobileBertForQuestionAnswering (MobileBERT model)
  • mpnetTFMPNetForQuestionAnswering (MPNet model)
  • rembertTFRemBertForQuestionAnswering (RemBERT model)
  • robertaTFRobertaForQuestionAnswering (RoBERTa model)
  • roberta-prelayernormTFRobertaPreLayerNormForQuestionAnswering (RoBERTa-PreLayerNorm model)
  • roformerTFRoFormerForQuestionAnswering (RoFormer model)
  • xlmTFXLMForQuestionAnsweringSimple (XLM model)
  • xlm-robertaTFXLMRobertaForQuestionAnswering (XLM-RoBERTa model)
  • xlnetTFXLNetForQuestionAnsweringSimple (XLNet model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForQuestionAnswering.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForQuestionAnswering

class transformers.FlaxAutoModelForQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AlbertConfig configuration class: FlaxAlbertForQuestionAnswering (ALBERT model)
    • BartConfig configuration class: FlaxBartForQuestionAnswering (BART model)
    • BertConfig configuration class: FlaxBertForQuestionAnswering (BERT model)
    • BigBirdConfig configuration class: FlaxBigBirdForQuestionAnswering (BigBird model)
    • DistilBertConfig configuration class: FlaxDistilBertForQuestionAnswering (DistilBERT model)
    • ElectraConfig configuration class: FlaxElectraForQuestionAnswering (ELECTRA model)
    • MBartConfig configuration class: FlaxMBartForQuestionAnswering (mBART model)
    • RoFormerConfig configuration class: FlaxRoFormerForQuestionAnswering (RoFormer model)
    • RobertaConfig configuration class: FlaxRobertaForQuestionAnswering (RoBERTa model)
    • RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForQuestionAnswering (RoBERTa-PreLayerNorm model)
    • XLMRobertaConfig configuration class: FlaxXLMRobertaForQuestionAnswering (XLM-RoBERTa model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • albertFlaxAlbertForQuestionAnswering (ALBERT model)
  • bartFlaxBartForQuestionAnswering (BART model)
  • bertFlaxBertForQuestionAnswering (BERT model)
  • big_birdFlaxBigBirdForQuestionAnswering (BigBird model)
  • distilbertFlaxDistilBertForQuestionAnswering (DistilBERT model)
  • electraFlaxElectraForQuestionAnswering (ELECTRA model)
  • mbartFlaxMBartForQuestionAnswering (mBART model)
  • robertaFlaxRobertaForQuestionAnswering (RoBERTa model)
  • roberta-prelayernormFlaxRobertaPreLayerNormForQuestionAnswering (RoBERTa-PreLayerNorm model)
  • roformerFlaxRoFormerForQuestionAnswering (RoFormer model)
  • xlm-robertaFlaxXLMRobertaForQuestionAnswering (XLM-RoBERTa model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForQuestionAnswering.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForTextEncoding

class transformers.AutoModelForTextEncoding

< >

( *args **kwargs )

TFAutoModelForTextEncoding

class transformers.TFAutoModelForTextEncoding

< >

( *args **kwargs )

Computer vision

以下の自動クラスは、次のコンピュータービジョンタスクに利用可能です。

AutoModelForDepthEstimation

class transformers.AutoModelForDepthEstimation

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a depth estimation head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • DPTConfig configuration class: DPTForDepthEstimation (DPT model)
    • DepthAnythingConfig configuration class: DepthAnythingForDepthEstimation (Depth Anything model)
    • DepthProConfig configuration class: DepthProForDepthEstimation (DepthPro model)
    • GLPNConfig configuration class: GLPNForDepthEstimation (GLPN model)
    • PromptDepthAnythingConfig configuration class: PromptDepthAnythingForDepthEstimation (PromptDepthAnything model)
    • ZoeDepthConfig configuration class: ZoeDepthForDepthEstimation (ZoeDepth model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a depth estimation head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForDepthEstimation

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForDepthEstimation.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a depth estimation head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • depth_anythingDepthAnythingForDepthEstimation (Depth Anything model)
  • depth_proDepthProForDepthEstimation (DepthPro model)
  • dptDPTForDepthEstimation (DPT model)
  • glpnGLPNForDepthEstimation (GLPN model)
  • prompt_depth_anythingPromptDepthAnythingForDepthEstimation (PromptDepthAnything model)
  • zoedepthZoeDepthForDepthEstimation (ZoeDepth model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForDepthEstimation

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForDepthEstimation.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForDepthEstimation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForDepthEstimation.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForImageClassification

class transformers.AutoModelForImageClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a image classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BeitConfig configuration class: BeitForImageClassification (BEiT model)
    • BitConfig configuration class: BitForImageClassification (BiT model)
    • CLIPConfig configuration class: CLIPForImageClassification (CLIP model)
    • ConvNextConfig configuration class: ConvNextForImageClassification (ConvNeXT model)
    • ConvNextV2Config configuration class: ConvNextV2ForImageClassification (ConvNeXTV2 model)
    • CvtConfig configuration class: CvtForImageClassification (CvT model)
    • Data2VecVisionConfig configuration class: Data2VecVisionForImageClassification (Data2VecVision model)
    • DeiTConfig configuration class: DeiTForImageClassification or DeiTForImageClassificationWithTeacher (DeiT model)
    • DinatConfig configuration class: DinatForImageClassification (DiNAT model)
    • Dinov2Config configuration class: Dinov2ForImageClassification (DINOv2 model)
    • Dinov2WithRegistersConfig configuration class: Dinov2WithRegistersForImageClassification (DINOv2 with Registers model)
    • DonutSwinConfig configuration class: DonutSwinForImageClassification (DonutSwin model)
    • EfficientFormerConfig configuration class: EfficientFormerForImageClassification or EfficientFormerForImageClassificationWithTeacher (EfficientFormer model)
    • EfficientNetConfig configuration class: EfficientNetForImageClassification (EfficientNet model)
    • FocalNetConfig configuration class: FocalNetForImageClassification (FocalNet model)
    • HGNetV2Config configuration class: HGNetV2ForImageClassification (HGNet-V2 model)
    • HieraConfig configuration class: HieraForImageClassification (Hiera model)
    • IJepaConfig configuration class: IJepaForImageClassification (I-JEPA model)
    • ImageGPTConfig configuration class: ImageGPTForImageClassification (ImageGPT model)
    • LevitConfig configuration class: LevitForImageClassification or LevitForImageClassificationWithTeacher (LeViT model)
    • MetaClip2Config configuration class: MetaClip2ForImageClassification (MetaCLIP 2 model)
    • MobileNetV1Config configuration class: MobileNetV1ForImageClassification (MobileNetV1 model)
    • MobileNetV2Config configuration class: MobileNetV2ForImageClassification (MobileNetV2 model)
    • MobileViTConfig configuration class: MobileViTForImageClassification (MobileViT model)
    • MobileViTV2Config configuration class: MobileViTV2ForImageClassification (MobileViTV2 model)
    • NatConfig configuration class: NatForImageClassification (NAT model)
    • PerceiverConfig configuration class: PerceiverForImageClassificationLearned or PerceiverForImageClassificationFourier or PerceiverForImageClassificationConvProcessing (Perceiver model)
    • PoolFormerConfig configuration class: PoolFormerForImageClassification (PoolFormer model)
    • PvtConfig configuration class: PvtForImageClassification (PVT model)
    • PvtV2Config configuration class: PvtV2ForImageClassification (PVTv2 model)
    • RegNetConfig configuration class: RegNetForImageClassification (RegNet model)
    • ResNetConfig configuration class: ResNetForImageClassification (ResNet model)
    • SegformerConfig configuration class: SegformerForImageClassification (SegFormer model)
    • ShieldGemma2Config configuration class: ShieldGemma2ForImageClassification (Shieldgemma2 model)
    • Siglip2Config configuration class: Siglip2ForImageClassification (SigLIP2 model)
    • SiglipConfig configuration class: SiglipForImageClassification (SigLIP model)
    • SwiftFormerConfig configuration class: SwiftFormerForImageClassification (SwiftFormer model)
    • SwinConfig configuration class: SwinForImageClassification (Swin Transformer model)
    • Swinv2Config configuration class: Swinv2ForImageClassification (Swin Transformer V2 model)
    • TextNetConfig configuration class: TextNetForImageClassification (TextNet model)
    • TimmWrapperConfig configuration class: TimmWrapperForImageClassification (TimmWrapperModel model)
    • VanConfig configuration class: VanForImageClassification (VAN model)
    • ViTConfig configuration class: ViTForImageClassification (ViT model)
    • ViTHybridConfig configuration class: ViTHybridForImageClassification (ViT Hybrid model)
    • ViTMSNConfig configuration class: ViTMSNForImageClassification (ViTMSN model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a image classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForImageClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForImageClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a image classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • beitBeitForImageClassification (BEiT model)
  • bitBitForImageClassification (BiT model)
  • clipCLIPForImageClassification (CLIP model)
  • convnextConvNextForImageClassification (ConvNeXT model)
  • convnextv2ConvNextV2ForImageClassification (ConvNeXTV2 model)
  • cvtCvtForImageClassification (CvT model)
  • data2vec-visionData2VecVisionForImageClassification (Data2VecVision model)
  • deitDeiTForImageClassification or DeiTForImageClassificationWithTeacher (DeiT model)
  • dinatDinatForImageClassification (DiNAT model)
  • dinov2Dinov2ForImageClassification (DINOv2 model)
  • dinov2_with_registersDinov2WithRegistersForImageClassification (DINOv2 with Registers model)
  • donut-swinDonutSwinForImageClassification (DonutSwin model)
  • efficientformerEfficientFormerForImageClassification or EfficientFormerForImageClassificationWithTeacher (EfficientFormer model)
  • efficientnetEfficientNetForImageClassification (EfficientNet model)
  • focalnetFocalNetForImageClassification (FocalNet model)
  • hgnet_v2HGNetV2ForImageClassification (HGNet-V2 model)
  • hieraHieraForImageClassification (Hiera model)
  • ijepaIJepaForImageClassification (I-JEPA model)
  • imagegptImageGPTForImageClassification (ImageGPT model)
  • levitLevitForImageClassification or LevitForImageClassificationWithTeacher (LeViT model)
  • metaclip_2MetaClip2ForImageClassification (MetaCLIP 2 model)
  • mobilenet_v1MobileNetV1ForImageClassification (MobileNetV1 model)
  • mobilenet_v2MobileNetV2ForImageClassification (MobileNetV2 model)
  • mobilevitMobileViTForImageClassification (MobileViT model)
  • mobilevitv2MobileViTV2ForImageClassification (MobileViTV2 model)
  • natNatForImageClassification (NAT model)
  • perceiverPerceiverForImageClassificationLearned or PerceiverForImageClassificationFourier or PerceiverForImageClassificationConvProcessing (Perceiver model)
  • poolformerPoolFormerForImageClassification (PoolFormer model)
  • pvtPvtForImageClassification (PVT model)
  • pvt_v2PvtV2ForImageClassification (PVTv2 model)
  • regnetRegNetForImageClassification (RegNet model)
  • resnetResNetForImageClassification (ResNet model)
  • segformerSegformerForImageClassification (SegFormer model)
  • shieldgemma2ShieldGemma2ForImageClassification (Shieldgemma2 model)
  • siglipSiglipForImageClassification (SigLIP model)
  • siglip2Siglip2ForImageClassification (SigLIP2 model)
  • swiftformerSwiftFormerForImageClassification (SwiftFormer model)
  • swinSwinForImageClassification (Swin Transformer model)
  • swinv2Swinv2ForImageClassification (Swin Transformer V2 model)
  • textnetTextNetForImageClassification (TextNet model)
  • timm_wrapperTimmWrapperForImageClassification (TimmWrapperModel model)
  • vanVanForImageClassification (VAN model)
  • vitViTForImageClassification (ViT model)
  • vit_hybridViTHybridForImageClassification (ViT Hybrid model)
  • vit_msnViTMSNForImageClassification (ViTMSN model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForImageClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForImageClassification.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForImageClassification

class transformers.TFAutoModelForImageClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a image classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a image classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForImageClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForImageClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a image classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

Examples:

>>> from transformers import AutoConfig, TFAutoModelForImageClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForImageClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForImageClassification

class transformers.FlaxAutoModelForImageClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a image classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BeitConfig configuration class: FlaxBeitForImageClassification (BEiT model)
    • Dinov2Config configuration class: FlaxDinov2ForImageClassification (DINOv2 model)
    • RegNetConfig configuration class: FlaxRegNetForImageClassification (RegNet model)
    • ResNetConfig configuration class: FlaxResNetForImageClassification (ResNet model)
    • ViTConfig configuration class: FlaxViTForImageClassification (ViT model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a image classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForImageClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForImageClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a image classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • beitFlaxBeitForImageClassification (BEiT model)
  • dinov2FlaxDinov2ForImageClassification (DINOv2 model)
  • regnetFlaxRegNetForImageClassification (RegNet model)
  • resnetFlaxResNetForImageClassification (ResNet model)
  • vitFlaxViTForImageClassification (ViT model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForImageClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForImageClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForVideoClassification

class transformers.AutoModelForVideoClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a video classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • TimesformerConfig configuration class: TimesformerForVideoClassification (TimeSformer model)
    • VJEPA2Config configuration class: VJEPA2ForVideoClassification (VJEPA2Model model)
    • VideoMAEConfig configuration class: VideoMAEForVideoClassification (VideoMAE model)
    • VivitConfig configuration class: VivitForVideoClassification (ViViT model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a video classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForVideoClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForVideoClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a video classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • timesformerTimesformerForVideoClassification (TimeSformer model)
  • videomaeVideoMAEForVideoClassification (VideoMAE model)
  • vivitVivitForVideoClassification (ViViT model)
  • vjepa2VJEPA2ForVideoClassification (VJEPA2Model model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForVideoClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForVideoClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForVideoClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForVideoClassification.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForMaskedImageModeling

class transformers.AutoModelForMaskedImageModeling

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a masked image modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • DeiTConfig configuration class: DeiTForMaskedImageModeling (DeiT model)
    • FocalNetConfig configuration class: FocalNetForMaskedImageModeling (FocalNet model)
    • SwinConfig configuration class: SwinForMaskedImageModeling (Swin Transformer model)
    • Swinv2Config configuration class: Swinv2ForMaskedImageModeling (Swin Transformer V2 model)
    • ViTConfig configuration class: ViTForMaskedImageModeling (ViT model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a masked image modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForMaskedImageModeling

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForMaskedImageModeling.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a masked image modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • deitDeiTForMaskedImageModeling (DeiT model)
  • focalnetFocalNetForMaskedImageModeling (FocalNet model)
  • swinSwinForMaskedImageModeling (Swin Transformer model)
  • swinv2Swinv2ForMaskedImageModeling (Swin Transformer V2 model)
  • vitViTForMaskedImageModeling (ViT model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForMaskedImageModeling

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForMaskedImageModeling.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForMaskedImageModeling

class transformers.TFAutoModelForMaskedImageModeling

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a masked image modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a masked image modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForMaskedImageModeling

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForMaskedImageModeling.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a masked image modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

Examples:

>>> from transformers import AutoConfig, TFAutoModelForMaskedImageModeling

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForMaskedImageModeling.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForObjectDetection

class transformers.AutoModelForObjectDetection

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a object detection head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a object detection head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForObjectDetection

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForObjectDetection.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a object detection head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • conditional_detrConditionalDetrForObjectDetection (Conditional DETR model)
  • d_fineDFineForObjectDetection (D-FINE model)
  • dab-detrDabDetrForObjectDetection (DAB-DETR model)
  • deformable_detrDeformableDetrForObjectDetection (Deformable DETR model)
  • detaDetaForObjectDetection (DETA model)
  • detrDetrForObjectDetection (DETR model)
  • rt_detrRTDetrForObjectDetection (RT-DETR model)
  • rt_detr_v2RTDetrV2ForObjectDetection (RT-DETRv2 model)
  • table-transformerTableTransformerForObjectDetection (Table Transformer model)
  • yolosYolosForObjectDetection (YOLOS model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForObjectDetection

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForObjectDetection.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForObjectDetection.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForObjectDetection.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForImageSegmentation

class transformers.AutoModelForImageSegmentation

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a image segmentation head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a image segmentation head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForImageSegmentation

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForImageSegmentation.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a image segmentation head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForImageSegmentation

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForImageSegmentation.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForImageSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForImageSegmentation.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForImageToImage

class transformers.AutoModelForImageToImage

< >

( *args **kwargs )

AutoModelForSemanticSegmentation

class transformers.AutoModelForSemanticSegmentation

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a semantic segmentation head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • BeitConfig configuration class: BeitForSemanticSegmentation (BEiT model)
    • DPTConfig configuration class: DPTForSemanticSegmentation (DPT model)
    • Data2VecVisionConfig configuration class: Data2VecVisionForSemanticSegmentation (Data2VecVision model)
    • MobileNetV2Config configuration class: MobileNetV2ForSemanticSegmentation (MobileNetV2 model)
    • MobileViTConfig configuration class: MobileViTForSemanticSegmentation (MobileViT model)
    • MobileViTV2Config configuration class: MobileViTV2ForSemanticSegmentation (MobileViTV2 model)
    • SegformerConfig configuration class: SegformerForSemanticSegmentation (SegFormer model)
    • UperNetConfig configuration class: UperNetForSemanticSegmentation (UPerNet model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a semantic segmentation head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForSemanticSegmentation

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForSemanticSegmentation.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a semantic segmentation head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • beitBeitForSemanticSegmentation (BEiT model)
  • data2vec-visionData2VecVisionForSemanticSegmentation (Data2VecVision model)
  • dptDPTForSemanticSegmentation (DPT model)
  • mobilenet_v2MobileNetV2ForSemanticSegmentation (MobileNetV2 model)
  • mobilevitMobileViTForSemanticSegmentation (MobileViT model)
  • mobilevitv2MobileViTV2ForSemanticSegmentation (MobileViTV2 model)
  • segformerSegformerForSemanticSegmentation (SegFormer model)
  • upernetUperNetForSemanticSegmentation (UPerNet model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForSemanticSegmentation

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForSemanticSegmentation.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForSemanticSegmentation

class transformers.TFAutoModelForSemanticSegmentation

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a semantic segmentation head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a semantic segmentation head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSemanticSegmentation

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForSemanticSegmentation.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a semantic segmentation head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • data2vec-visionTFData2VecVisionForSemanticSegmentation (Data2VecVision model)
  • mobilevitTFMobileViTForSemanticSegmentation (MobileViT model)
  • segformerTFSegformerForSemanticSegmentation (SegFormer model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSemanticSegmentation

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForInstanceSegmentation

class transformers.AutoModelForInstanceSegmentation

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a instance segmentation head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • MaskFormerConfig configuration class: MaskFormerForInstanceSegmentation (MaskFormer model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a instance segmentation head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForInstanceSegmentation

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForInstanceSegmentation.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a instance segmentation head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • maskformerMaskFormerForInstanceSegmentation (MaskFormer model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForInstanceSegmentation

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForInstanceSegmentation.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForInstanceSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForInstanceSegmentation.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForUniversalSegmentation

class transformers.AutoModelForUniversalSegmentation

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a universal image segmentation head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • DetrConfig configuration class: DetrForSegmentation (DETR model)
    • EomtConfig configuration class: EomtForUniversalSegmentation (EoMT model)
    • Mask2FormerConfig configuration class: Mask2FormerForUniversalSegmentation (Mask2Former model)
    • MaskFormerConfig configuration class: MaskFormerForInstanceSegmentation (MaskFormer model)
    • OneFormerConfig configuration class: OneFormerForUniversalSegmentation (OneFormer model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a universal image segmentation head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForUniversalSegmentation

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForUniversalSegmentation.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a universal image segmentation head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • detrDetrForSegmentation (DETR model)
  • eomtEomtForUniversalSegmentation (EoMT model)
  • mask2formerMask2FormerForUniversalSegmentation (Mask2Former model)
  • maskformerMaskFormerForInstanceSegmentation (MaskFormer model)
  • oneformerOneFormerForUniversalSegmentation (OneFormer model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForUniversalSegmentation

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForUniversalSegmentation.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForUniversalSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForUniversalSegmentation.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForZeroShotImageClassification

class transformers.AutoModelForZeroShotImageClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a zero-shot image classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a zero-shot image classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForZeroShotImageClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForZeroShotImageClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a zero-shot image classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • alignAlignModel (ALIGN model)
  • altclipAltCLIPModel (AltCLIP model)
  • blipBlipModel (BLIP model)
  • blip-2Blip2ForImageTextRetrieval (BLIP-2 model)
  • chinese_clipChineseCLIPModel (Chinese-CLIP model)
  • clipCLIPModel (CLIP model)
  • clipsegCLIPSegModel (CLIPSeg model)
  • metaclip_2MetaClip2Model (MetaCLIP 2 model)
  • siglipSiglipModel (SigLIP model)
  • siglip2Siglip2Model (SigLIP2 model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForZeroShotImageClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForZeroShotImageClassification.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForZeroShotImageClassification

class transformers.TFAutoModelForZeroShotImageClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a zero-shot image classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a zero-shot image classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForZeroShotImageClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForZeroShotImageClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a zero-shot image classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

Examples:

>>> from transformers import AutoConfig, TFAutoModelForZeroShotImageClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForZeroShotImageClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForZeroShotObjectDetection

class transformers.AutoModelForZeroShotObjectDetection

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a zero-shot object detection head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • GroundingDinoConfig configuration class: GroundingDinoForObjectDetection (Grounding DINO model)
    • MMGroundingDinoConfig configuration class: MMGroundingDinoForObjectDetection (MM Grounding DINO model)
    • OmDetTurboConfig configuration class: OmDetTurboForObjectDetection (OmDet-Turbo model)
    • OwlViTConfig configuration class: OwlViTForObjectDetection (OWL-ViT model)
    • Owlv2Config configuration class: Owlv2ForObjectDetection (OWLv2 model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a zero-shot object detection head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForZeroShotObjectDetection

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForZeroShotObjectDetection.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a zero-shot object detection head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • grounding-dinoGroundingDinoForObjectDetection (Grounding DINO model)
  • mm-grounding-dinoMMGroundingDinoForObjectDetection (MM Grounding DINO model)
  • omdet-turboOmDetTurboForObjectDetection (OmDet-Turbo model)
  • owlv2Owlv2ForObjectDetection (OWLv2 model)
  • owlvitOwlViTForObjectDetection (OWL-ViT model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForZeroShotObjectDetection

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForZeroShotObjectDetection.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForZeroShotObjectDetection.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForZeroShotObjectDetection.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

Audio

以下の自動クラスは、次の音声タスクに利用可能です。

AutoModelForAudioClassification

class transformers.AutoModelForAudioClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a audio classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • ASTConfig configuration class: ASTForAudioClassification (Audio Spectrogram Transformer model)
    • Data2VecAudioConfig configuration class: Data2VecAudioForSequenceClassification (Data2VecAudio model)
    • HubertConfig configuration class: HubertForSequenceClassification (Hubert model)
    • SEWConfig configuration class: SEWForSequenceClassification (SEW model)
    • SEWDConfig configuration class: SEWDForSequenceClassification (SEW-D model)
    • UniSpeechConfig configuration class: UniSpeechForSequenceClassification (UniSpeech model)
    • UniSpeechSatConfig configuration class: UniSpeechSatForSequenceClassification (UniSpeechSat model)
    • Wav2Vec2BertConfig configuration class: Wav2Vec2BertForSequenceClassification (Wav2Vec2-BERT model)
    • Wav2Vec2Config configuration class: Wav2Vec2ForSequenceClassification (Wav2Vec2 model)
    • Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerForSequenceClassification (Wav2Vec2-Conformer model)
    • WavLMConfig configuration class: WavLMForSequenceClassification (WavLM model)
    • WhisperConfig configuration class: WhisperForAudioClassification (Whisper model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a audio classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForAudioClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForAudioClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a audio classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • audio-spectrogram-transformerASTForAudioClassification (Audio Spectrogram Transformer model)
  • data2vec-audioData2VecAudioForSequenceClassification (Data2VecAudio model)
  • hubertHubertForSequenceClassification (Hubert model)
  • sewSEWForSequenceClassification (SEW model)
  • sew-dSEWDForSequenceClassification (SEW-D model)
  • unispeechUniSpeechForSequenceClassification (UniSpeech model)
  • unispeech-satUniSpeechSatForSequenceClassification (UniSpeechSat model)
  • wav2vec2Wav2Vec2ForSequenceClassification (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2BertForSequenceClassification (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2ConformerForSequenceClassification (Wav2Vec2-Conformer model)
  • wavlmWavLMForSequenceClassification (WavLM model)
  • whisperWhisperForAudioClassification (Whisper model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForAudioClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForAudioClassification.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForAudioFrameClassification

class transformers.TFAutoModelForAudioClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a audio classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • Wav2Vec2Config configuration class: TFWav2Vec2ForSequenceClassification (Wav2Vec2 model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a audio classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForAudioClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForAudioClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a audio classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • wav2vec2TFWav2Vec2ForSequenceClassification (Wav2Vec2 model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForAudioClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForAudioClassification.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

TFAutoModelForAudioFrameClassification

class transformers.AutoModelForAudioFrameClassification

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a audio frame (token) classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • Data2VecAudioConfig configuration class: Data2VecAudioForAudioFrameClassification (Data2VecAudio model)
    • UniSpeechSatConfig configuration class: UniSpeechSatForAudioFrameClassification (UniSpeechSat model)
    • Wav2Vec2BertConfig configuration class: Wav2Vec2BertForAudioFrameClassification (Wav2Vec2-BERT model)
    • Wav2Vec2Config configuration class: Wav2Vec2ForAudioFrameClassification (Wav2Vec2 model)
    • Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerForAudioFrameClassification (Wav2Vec2-Conformer model)
    • WavLMConfig configuration class: WavLMForAudioFrameClassification (WavLM model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a audio frame (token) classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForAudioFrameClassification

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForAudioFrameClassification.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a audio frame (token) classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • data2vec-audioData2VecAudioForAudioFrameClassification (Data2VecAudio model)
  • unispeech-satUniSpeechSatForAudioFrameClassification (UniSpeechSat model)
  • wav2vec2Wav2Vec2ForAudioFrameClassification (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2BertForAudioFrameClassification (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2ConformerForAudioFrameClassification (Wav2Vec2-Conformer model)
  • wavlmWavLMForAudioFrameClassification (WavLM model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForAudioFrameClassification

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForAudioFrameClassification.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForAudioFrameClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForAudioFrameClassification.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForCTC

class transformers.AutoModelForCTC

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a connectionist temporal classification head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • Data2VecAudioConfig configuration class: Data2VecAudioForCTC (Data2VecAudio model)
    • HubertConfig configuration class: HubertForCTC (Hubert model)
    • MCTCTConfig configuration class: MCTCTForCTC (M-CTC-T model)
    • SEWConfig configuration class: SEWForCTC (SEW model)
    • SEWDConfig configuration class: SEWDForCTC (SEW-D model)
    • UniSpeechConfig configuration class: UniSpeechForCTC (UniSpeech model)
    • UniSpeechSatConfig configuration class: UniSpeechSatForCTC (UniSpeechSat model)
    • Wav2Vec2BertConfig configuration class: Wav2Vec2BertForCTC (Wav2Vec2-BERT model)
    • Wav2Vec2Config configuration class: Wav2Vec2ForCTC (Wav2Vec2 model)
    • Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerForCTC (Wav2Vec2-Conformer model)
    • WavLMConfig configuration class: WavLMForCTC (WavLM model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a connectionist temporal classification head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForCTC

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForCTC.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a connectionist temporal classification head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • data2vec-audioData2VecAudioForCTC (Data2VecAudio model)
  • hubertHubertForCTC (Hubert model)
  • mctctMCTCTForCTC (M-CTC-T model)
  • sewSEWForCTC (SEW model)
  • sew-dSEWDForCTC (SEW-D model)
  • unispeechUniSpeechForCTC (UniSpeech model)
  • unispeech-satUniSpeechSatForCTC (UniSpeechSat model)
  • wav2vec2Wav2Vec2ForCTC (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2BertForCTC (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2ConformerForCTC (Wav2Vec2-Conformer model)
  • wavlmWavLMForCTC (WavLM model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForCTC

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForCTC.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForCTC.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForCTC.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForSpeechSeq2Seq

class transformers.AutoModelForSpeechSeq2Seq

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • DiaConfig configuration class: DiaForConditionalGeneration (Dia model)
    • GraniteSpeechConfig configuration class: GraniteSpeechForConditionalGeneration (GraniteSpeech model)
    • KyutaiSpeechToTextConfig configuration class: KyutaiSpeechToTextForConditionalGeneration (KyutaiSpeechToText model)
    • MoonshineConfig configuration class: MoonshineForConditionalGeneration (Moonshine model)
    • Pop2PianoConfig configuration class: Pop2PianoForConditionalGeneration (Pop2Piano model)
    • SeamlessM4TConfig configuration class: SeamlessM4TForSpeechToText (SeamlessM4T model)
    • SeamlessM4Tv2Config configuration class: SeamlessM4Tv2ForSpeechToText (SeamlessM4Tv2 model)
    • Speech2TextConfig configuration class: Speech2TextForConditionalGeneration (Speech2Text model)
    • SpeechEncoderDecoderConfig configuration class: SpeechEncoderDecoderModel (Speech Encoder decoder model)
    • SpeechT5Config configuration class: SpeechT5ForSpeechToText (SpeechT5 model)
    • WhisperConfig configuration class: WhisperForConditionalGeneration (Whisper model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForSpeechSeq2Seq

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForSpeechSeq2Seq.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • diaDiaForConditionalGeneration (Dia model)
  • granite_speechGraniteSpeechForConditionalGeneration (GraniteSpeech model)
  • kyutai_speech_to_textKyutaiSpeechToTextForConditionalGeneration (KyutaiSpeechToText model)
  • moonshineMoonshineForConditionalGeneration (Moonshine model)
  • pop2pianoPop2PianoForConditionalGeneration (Pop2Piano model)
  • seamless_m4tSeamlessM4TForSpeechToText (SeamlessM4T model)
  • seamless_m4t_v2SeamlessM4Tv2ForSpeechToText (SeamlessM4Tv2 model)
  • speech-encoder-decoderSpeechEncoderDecoderModel (Speech Encoder decoder model)
  • speech_to_textSpeech2TextForConditionalGeneration (Speech2Text model)
  • speecht5SpeechT5ForSpeechToText (SpeechT5 model)
  • whisperWhisperForConditionalGeneration (Whisper model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForSpeechSeq2Seq

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForSpeechSeq2Seq.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForSpeechSeq2Seq

class transformers.TFAutoModelForSpeechSeq2Seq

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • Speech2TextConfig configuration class: TFSpeech2TextForConditionalGeneration (Speech2Text model)
    • WhisperConfig configuration class: TFWhisperForConditionalGeneration (Whisper model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSpeechSeq2Seq

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForSpeechSeq2Seq.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • speech_to_textTFSpeech2TextForConditionalGeneration (Speech2Text model)
  • whisperTFWhisperForConditionalGeneration (Whisper model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForSpeechSeq2Seq

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForSpeechSeq2Seq.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForSpeechSeq2Seq

class transformers.FlaxAutoModelForSpeechSeq2Seq

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • SpeechEncoderDecoderConfig configuration class: FlaxSpeechEncoderDecoderModel (Speech Encoder decoder model)
    • WhisperConfig configuration class: FlaxWhisperForConditionalGeneration (Whisper model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForSpeechSeq2Seq

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForSpeechSeq2Seq.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • speech-encoder-decoderFlaxSpeechEncoderDecoderModel (Speech Encoder decoder model)
  • whisperFlaxWhisperForConditionalGeneration (Whisper model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForSpeechSeq2Seq

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForSpeechSeq2Seq.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForAudioXVector

class transformers.AutoModelForAudioXVector

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a audio retrieval via x-vector head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • Data2VecAudioConfig configuration class: Data2VecAudioForXVector (Data2VecAudio model)
    • UniSpeechSatConfig configuration class: UniSpeechSatForXVector (UniSpeechSat model)
    • Wav2Vec2BertConfig configuration class: Wav2Vec2BertForXVector (Wav2Vec2-BERT model)
    • Wav2Vec2Config configuration class: Wav2Vec2ForXVector (Wav2Vec2 model)
    • Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerForXVector (Wav2Vec2-Conformer model)
    • WavLMConfig configuration class: WavLMForXVector (WavLM model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a audio retrieval via x-vector head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForAudioXVector

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForAudioXVector.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a audio retrieval via x-vector head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • data2vec-audioData2VecAudioForXVector (Data2VecAudio model)
  • unispeech-satUniSpeechSatForXVector (UniSpeechSat model)
  • wav2vec2Wav2Vec2ForXVector (Wav2Vec2 model)
  • wav2vec2-bertWav2Vec2BertForXVector (Wav2Vec2-BERT model)
  • wav2vec2-conformerWav2Vec2ConformerForXVector (Wav2Vec2-Conformer model)
  • wavlmWavLMForXVector (WavLM model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForAudioXVector

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForAudioXVector.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForAudioXVector.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForAudioXVector.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForTextToSpectrogram

class transformers.AutoModelForTextToSpectrogram

< >

( *args **kwargs )

AutoModelForTextToWaveform

class transformers.AutoModelForTextToWaveform

< >

( *args **kwargs )

Multimodal

以下の自動クラスは、次のマルチモーダルタスクに利用可能です。

AutoModelForTableQuestionAnswering

class transformers.AutoModelForTableQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a table question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • TapasConfig configuration class: TapasForQuestionAnswering (TAPAS model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a table question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForTableQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google/tapas-base-finetuned-wtq")
>>> model = AutoModelForTableQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a table question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • tapasTapasForQuestionAnswering (TAPAS model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForTableQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")

>>> # Update configuration during loading
>>> model = AutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/tapas_tf_model_config.json")
>>> model = AutoModelForTableQuestionAnswering.from_pretrained(
...     "./tf_model/tapas_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForTableQuestionAnswering

class transformers.TFAutoModelForTableQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a table question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • TapasConfig configuration class: TFTapasForQuestionAnswering (TAPAS model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a table question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForTableQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google/tapas-base-finetuned-wtq")
>>> model = TFAutoModelForTableQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a table question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • tapasTFTapasForQuestionAnswering (TAPAS model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForTableQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")

>>> # Update configuration during loading
>>> model = TFAutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/tapas_pt_model_config.json")
>>> model = TFAutoModelForTableQuestionAnswering.from_pretrained(
...     "./pt_model/tapas_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForDocumentQuestionAnswering

class transformers.AutoModelForDocumentQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a document question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • LayoutLMConfig configuration class: LayoutLMForQuestionAnswering (LayoutLM model)
    • LayoutLMv2Config configuration class: LayoutLMv2ForQuestionAnswering (LayoutLMv2 model)
    • LayoutLMv3Config configuration class: LayoutLMv3ForQuestionAnswering (LayoutLMv3 model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a document question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForDocumentQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")
>>> model = AutoModelForDocumentQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a document question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • layoutlmLayoutLMForQuestionAnswering (LayoutLM model)
  • layoutlmv2LayoutLMv2ForQuestionAnswering (LayoutLMv2 model)
  • layoutlmv3LayoutLMv3ForQuestionAnswering (LayoutLMv3 model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForDocumentQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")

>>> # Update configuration during loading
>>> model = AutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/layoutlm_tf_model_config.json")
>>> model = AutoModelForDocumentQuestionAnswering.from_pretrained(
...     "./tf_model/layoutlm_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

TFAutoModelForDocumentQuestionAnswering

class transformers.TFAutoModelForDocumentQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a document question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • LayoutLMConfig configuration class: TFLayoutLMForQuestionAnswering (LayoutLM model)
    • LayoutLMv3Config configuration class: TFLayoutLMv3ForQuestionAnswering (LayoutLMv3 model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a document question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForDocumentQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")
>>> model = TFAutoModelForDocumentQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a document question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • layoutlmTFLayoutLMForQuestionAnswering (LayoutLM model)
  • layoutlmv3TFLayoutLMv3ForQuestionAnswering (LayoutLMv3 model)

Examples:

>>> from transformers import AutoConfig, TFAutoModelForDocumentQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")

>>> # Update configuration during loading
>>> model = TFAutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/layoutlm_pt_model_config.json")
>>> model = TFAutoModelForDocumentQuestionAnswering.from_pretrained(
...     "./pt_model/layoutlm_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForVisualQuestionAnswering

class transformers.AutoModelForVisualQuestionAnswering

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a visual question answering head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

Instantiates one of the model classes of the library (with a visual question answering head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForVisualQuestionAnswering

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
>>> model = AutoModelForVisualQuestionAnswering.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a visual question answering head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForVisualQuestionAnswering

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForVisualQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")

>>> # Update configuration during loading
>>> model = AutoModelForVisualQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/vilt_tf_model_config.json")
>>> model = AutoModelForVisualQuestionAnswering.from_pretrained(
...     "./tf_model/vilt_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

AutoModelForVision2Seq

class transformers.AutoModelForVision2Seq

< >

( *args **kwargs )

TFAutoModelForVision2Seq

class transformers.TFAutoModelForVision2Seq

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a vision-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a vision-to-text modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, TFAutoModelForVision2Seq

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = TFAutoModelForVision2Seq.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a vision-to-text modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

Examples:

>>> from transformers import AutoConfig, TFAutoModelForVision2Seq

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = TFAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForVision2Seq.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

FlaxAutoModelForVision2Seq

class transformers.FlaxAutoModelForVision2Seq

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a vision-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • VisionEncoderDecoderConfig configuration class: FlaxVisionEncoderDecoderModel (Vision Encoder decoder model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a vision-to-text modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForVision2Seq

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = FlaxAutoModelForVision2Seq.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_pt (bool, optional, defaults to False) — Load the model weights from a PyTorch checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a vision-to-text modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • vision-encoder-decoderFlaxVisionEncoderDecoderModel (Vision Encoder decoder model)

Examples:

>>> from transformers import AutoConfig, FlaxAutoModelForVision2Seq

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = FlaxAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForVision2Seq.from_pretrained(
...     "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )

AutoModelForImageTextToText

class transformers.AutoModelForImageTextToText

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a image-text-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • AriaConfig configuration class: AriaForConditionalGeneration (Aria model)
    • AyaVisionConfig configuration class: AyaVisionForConditionalGeneration (AyaVision model)
    • Blip2Config configuration class: Blip2ForConditionalGeneration (BLIP-2 model)
    • BlipConfig configuration class: BlipForConditionalGeneration (BLIP model)
    • ChameleonConfig configuration class: ChameleonForConditionalGeneration (Chameleon model)
    • Cohere2VisionConfig configuration class: Cohere2VisionForConditionalGeneration (Cohere2Vision model)
    • DeepseekVLConfig configuration class: DeepseekVLForConditionalGeneration (DeepseekVL model)
    • DeepseekVLHybridConfig configuration class: DeepseekVLHybridForConditionalGeneration (DeepseekVLHybrid model)
    • Emu3Config configuration class: Emu3ForConditionalGeneration (Emu3 model)
    • EvollaConfig configuration class: EvollaForProteinText2Text (Evolla model)
    • Florence2Config configuration class: Florence2ForConditionalGeneration (Florence2 model)
    • FuyuConfig configuration class: FuyuForCausalLM (Fuyu model)
    • Gemma3Config configuration class: Gemma3ForConditionalGeneration (Gemma3ForConditionalGeneration model)
    • Gemma3nConfig configuration class: Gemma3nForConditionalGeneration (Gemma3nForConditionalGeneration model)
    • GitConfig configuration class: GitForCausalLM (GIT model)
    • Glm4vConfig configuration class: Glm4vForConditionalGeneration (GLM4V model)
    • Glm4vMoeConfig configuration class: Glm4vMoeForConditionalGeneration (GLM4VMOE model)
    • GotOcr2Config configuration class: GotOcr2ForConditionalGeneration (GOT-OCR2 model)
    • Idefics2Config configuration class: Idefics2ForConditionalGeneration (Idefics2 model)
    • Idefics3Config configuration class: Idefics3ForConditionalGeneration (Idefics3 model)
    • IdeficsConfig configuration class: IdeficsForVisionText2Text (IDEFICS model)
    • InstructBlipConfig configuration class: InstructBlipForConditionalGeneration (InstructBLIP model)
    • InternVLConfig configuration class: InternVLForConditionalGeneration (InternVL model)
    • JanusConfig configuration class: JanusForConditionalGeneration (Janus model)
    • Kosmos2Config configuration class: Kosmos2ForConditionalGeneration (KOSMOS-2 model)
    • Kosmos2_5Config configuration class: Kosmos2_5ForConditionalGeneration (KOSMOS-2.5 model)
    • Llama4Config configuration class: Llama4ForConditionalGeneration (Llama4 model)
    • LlavaConfig configuration class: LlavaForConditionalGeneration (LLaVa model)
    • LlavaNextConfig configuration class: LlavaNextForConditionalGeneration (LLaVA-NeXT model)
    • LlavaNextVideoConfig configuration class: LlavaNextVideoForConditionalGeneration (LLaVa-NeXT-Video model)
    • LlavaOnevisionConfig configuration class: LlavaOnevisionForConditionalGeneration (LLaVA-Onevision model)
    • Mistral3Config configuration class: Mistral3ForConditionalGeneration (Mistral3 model)
    • MllamaConfig configuration class: MllamaForConditionalGeneration (Mllama model)
    • Ovis2Config configuration class: Ovis2ForConditionalGeneration (Ovis2 model)
    • PaliGemmaConfig configuration class: PaliGemmaForConditionalGeneration (PaliGemma model)
    • PerceptionLMConfig configuration class: PerceptionLMForConditionalGeneration (PerceptionLM model)
    • Pix2StructConfig configuration class: Pix2StructForConditionalGeneration (Pix2Struct model)
    • PixtralVisionConfig configuration class: LlavaForConditionalGeneration (Pixtral model)
    • Qwen2VLConfig configuration class: Qwen2VLForConditionalGeneration (Qwen2VL model)
    • Qwen2_5_VLConfig configuration class: Qwen2_5_VLForConditionalGeneration (Qwen2_5_VL model)
    • ShieldGemma2Config configuration class: Gemma3ForConditionalGeneration (Shieldgemma2 model)
    • SmolVLMConfig configuration class: SmolVLMForConditionalGeneration (SmolVLM model)
    • UdopConfig configuration class: UdopForConditionalGeneration (UDOP model)
    • VipLlavaConfig configuration class: VipLlavaForConditionalGeneration (VipLlava model)
    • VisionEncoderDecoderConfig configuration class: VisionEncoderDecoderModel (Vision Encoder decoder model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a image-text-to-text modeling head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForImageTextToText

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForImageTextToText.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a image-text-to-text modeling head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • ariaAriaForConditionalGeneration (Aria model)
  • aya_visionAyaVisionForConditionalGeneration (AyaVision model)
  • blipBlipForConditionalGeneration (BLIP model)
  • blip-2Blip2ForConditionalGeneration (BLIP-2 model)
  • chameleonChameleonForConditionalGeneration (Chameleon model)
  • cohere2_visionCohere2VisionForConditionalGeneration (Cohere2Vision model)
  • deepseek_vlDeepseekVLForConditionalGeneration (DeepseekVL model)
  • deepseek_vl_hybridDeepseekVLHybridForConditionalGeneration (DeepseekVLHybrid model)
  • emu3Emu3ForConditionalGeneration (Emu3 model)
  • evollaEvollaForProteinText2Text (Evolla model)
  • florence2Florence2ForConditionalGeneration (Florence2 model)
  • fuyuFuyuForCausalLM (Fuyu model)
  • gemma3Gemma3ForConditionalGeneration (Gemma3ForConditionalGeneration model)
  • gemma3nGemma3nForConditionalGeneration (Gemma3nForConditionalGeneration model)
  • gitGitForCausalLM (GIT model)
  • glm4vGlm4vForConditionalGeneration (GLM4V model)
  • glm4v_moeGlm4vMoeForConditionalGeneration (GLM4VMOE model)
  • got_ocr2GotOcr2ForConditionalGeneration (GOT-OCR2 model)
  • ideficsIdeficsForVisionText2Text (IDEFICS model)
  • idefics2Idefics2ForConditionalGeneration (Idefics2 model)
  • idefics3Idefics3ForConditionalGeneration (Idefics3 model)
  • instructblipInstructBlipForConditionalGeneration (InstructBLIP model)
  • internvlInternVLForConditionalGeneration (InternVL model)
  • janusJanusForConditionalGeneration (Janus model)
  • kosmos-2Kosmos2ForConditionalGeneration (KOSMOS-2 model)
  • kosmos-2.5Kosmos2_5ForConditionalGeneration (KOSMOS-2.5 model)
  • llama4Llama4ForConditionalGeneration (Llama4 model)
  • llavaLlavaForConditionalGeneration (LLaVa model)
  • llava_nextLlavaNextForConditionalGeneration (LLaVA-NeXT model)
  • llava_next_videoLlavaNextVideoForConditionalGeneration (LLaVa-NeXT-Video model)
  • llava_onevisionLlavaOnevisionForConditionalGeneration (LLaVA-Onevision model)
  • mistral3Mistral3ForConditionalGeneration (Mistral3 model)
  • mllamaMllamaForConditionalGeneration (Mllama model)
  • ovis2Ovis2ForConditionalGeneration (Ovis2 model)
  • paligemmaPaliGemmaForConditionalGeneration (PaliGemma model)
  • perception_lmPerceptionLMForConditionalGeneration (PerceptionLM model)
  • pix2structPix2StructForConditionalGeneration (Pix2Struct model)
  • pixtralLlavaForConditionalGeneration (Pixtral model)
  • qwen2_5_vlQwen2_5_VLForConditionalGeneration (Qwen2_5_VL model)
  • qwen2_vlQwen2VLForConditionalGeneration (Qwen2VL model)
  • shieldgemma2Gemma3ForConditionalGeneration (Shieldgemma2 model)
  • smolvlmSmolVLMForConditionalGeneration (SmolVLM model)
  • udopUdopForConditionalGeneration (UDOP model)
  • vipllavaVipLlavaForConditionalGeneration (VipLlava model)
  • vision-encoder-decoderVisionEncoderDecoderModel (Vision Encoder decoder model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForImageTextToText

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForImageTextToText.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForImageTextToText.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForImageTextToText.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

Time Series

AutoModelForTimeSeriesPrediction

class transformers.AutoModelForTimeSeriesPrediction

< >

( *args **kwargs )

This is a generic model class that will be instantiated as one of the model classes of the library (with a time-series prediction head) when created with the from_pretrained() class method or the from_config() class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

< >

( **kwargs )

Parameters

  • config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:

    • TimesFmConfig configuration class: TimesFmModelForPrediction (TimesFm model)
  • attn_implementation (str, optional) — The attention implementation to use in the model (if relevant). Can be any of "eager" (manual implementation of the attention), "sdpa" (using F.scaled_dot_product_attention), or "flash_attention_2" (using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual "eager" implementation.

Instantiates one of the model classes of the library (with a time-series prediction head) from a configuration.

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

Examples:

>>> from transformers import AutoConfig, AutoModelForTimeSeriesPrediction

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
>>> model = AutoModelForTimeSeriesPrediction.from_config(config)

from_pretrained

< >

( *model_args **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
    • A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.
  • config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).
    • The model was saved using save_pretrained() and is reloaded by supplying the save directory.
    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
  • state_dict (dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

    This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
  • proxies (dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
    • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

Instantiate one of the model classes of the library (with a time-series prediction head) from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • timesfmTimesFmModelForPrediction (TimesFm model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

>>> from transformers import AutoConfig, AutoModelForTimeSeriesPrediction

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForTimeSeriesPrediction.from_pretrained("google-bert/bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModelForTimeSeriesPrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForTimeSeriesPrediction.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )
< > Update on GitHub