Transformers documentation
Auto Classes
Auto Classes
多くの場合、from_pretrained()メソッドに与えられた事前学習済みモデルの名前やパスから、使用したいアーキテクチャを推測することができます。自動クラスはこの仕事をあなたに代わって行うためにここにありますので、事前学習済みの重み/設定/語彙への名前/パスを与えると自動的に関連するモデルを取得できます。
AutoConfig、AutoModel、AutoTokenizerのいずれかをインスタンス化すると、関連するアーキテクチャのクラスが直接作成されます。例えば、
model = AutoModel.from_pretrained("google-bert/bert-base-cased")これはBertModelのインスタンスであるモデルを作成します。
各タスクごと、そして各バックエンド(PyTorch、TensorFlow、またはFlax)ごとにAutoModelのクラスが存在します。
自動クラスの拡張
それぞれの自動クラスには、カスタムクラスで拡張するためのメソッドがあります。例えば、NewModelというモデルのカスタムクラスを定義した場合、NewModelConfigを確保しておけばこのようにして自動クラスに追加することができます:
from transformers import AutoConfig, AutoModel
AutoConfig.register("new-model", NewModelConfig)
AutoModel.register(NewModelConfig, NewModel)その後、通常どおりauto classesを使用することができるようになります!
あなたの
NewModelConfigがPretrainedConfigのサブクラスである場合、そのmodel_type属性がコンフィグを登録するときに使用するキー(ここでは"new-model")と同じに設定されていることを確認してください。同様に、あなたの
NewModelがPreTrainedModelのサブクラスである場合、そのconfig_class属性がモデルを登録する際に使用するクラス(ここではNewModelConfig)と同じに設定されていることを確認してください。
AutoConfig
This is a generic configuration class that will be instantiated as one of the configuration classes of the library when created with the from_pretrained() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_pretrained
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike[str]] **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model configuration hosted inside a model repo on huggingface.co.
- A path to a directory containing a configuration file saved using the
save_pretrained() method, or the save_pretrained() method,
e.g.,
./my_model_directory/. - A path or url to a saved configuration JSON file, e.g.,
./my_model_directory/configuration.json.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download the model weights and configuration files and override the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - return_unused_kwargs (
bool, optional, defaults toFalse) — IfFalse, then this function returns just the final configuration object.If
True, then this functions returns aTuple(config, unused_kwargs)where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the part ofkwargswhich has not been used to updateconfigand is otherwise ignored. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs(additional keyword arguments, optional) —
The values in kwargs of any keys which are configuration attributes will be used to override the loaded
values. Behavior concerning key/value pairs whose keys are not configuration attributes is controlled
by the
return_unused_kwargskeyword parameter.
Instantiate one of the configuration classes of the library from a pretrained model configuration.
The configuration class to instantiate is selected based on the model_type property of the config object that
is loaded, or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:
- aimv2 —
Aimv2Config(AIMv2 model) - aimv2_vision_model —
Aimv2VisionConfig(Aimv2VisionModel model) - albert — AlbertConfig (ALBERT model)
- align — AlignConfig (ALIGN model)
- altclip — AltCLIPConfig (AltCLIP model)
- apertus —
ApertusConfig(Apertus model) - arcee —
ArceeConfig(Arcee model) - aria —
AriaConfig(Aria model) - aria_text —
AriaTextConfig(AriaText model) - audio-spectrogram-transformer — ASTConfig (Audio Spectrogram Transformer model)
- autoformer — AutoformerConfig (Autoformer model)
- aya_vision —
AyaVisionConfig(AyaVision model) - bamba —
BambaConfig(Bamba model) - bark — BarkConfig (Bark model)
- bart — BartConfig (BART model)
- beit — BeitConfig (BEiT model)
- bert — BertConfig (BERT model)
- bert-generation — BertGenerationConfig (Bert Generation model)
- big_bird — BigBirdConfig (BigBird model)
- bigbird_pegasus — BigBirdPegasusConfig (BigBird-Pegasus model)
- biogpt — BioGptConfig (BioGpt model)
- bit — BitConfig (BiT model)
- bitnet —
BitNetConfig(BitNet model) - blenderbot — BlenderbotConfig (Blenderbot model)
- blenderbot-small — BlenderbotSmallConfig (BlenderbotSmall model)
- blip — BlipConfig (BLIP model)
- blip-2 — Blip2Config (BLIP-2 model)
- blip_2_qformer — Blip2QFormerConfig (BLIP-2 QFormer model)
- bloom — BloomConfig (BLOOM model)
- blt —
BltConfig(Blt model) - bridgetower — BridgeTowerConfig (BridgeTower model)
- bros — BrosConfig (BROS model)
- camembert — CamembertConfig (CamemBERT model)
- canine — CanineConfig (CANINE model)
- chameleon —
ChameleonConfig(Chameleon model) - chinese_clip — ChineseCLIPConfig (Chinese-CLIP model)
- chinese_clip_vision_model — ChineseCLIPVisionConfig (ChineseCLIPVisionModel model)
- clap — ClapConfig (CLAP model)
- clip — CLIPConfig (CLIP model)
- clip_text_model — CLIPTextConfig (CLIPTextModel model)
- clip_vision_model — CLIPVisionConfig (CLIPVisionModel model)
- clipseg — CLIPSegConfig (CLIPSeg model)
- clvp — ClvpConfig (CLVP model)
- code_llama —
LlamaConfig(CodeLlama model) - codegen — CodeGenConfig (CodeGen model)
- cohere —
CohereConfig(Cohere model) - cohere2 —
Cohere2Config(Cohere2 model) - cohere2_vision —
Cohere2VisionConfig(Cohere2Vision model) - colpali —
ColPaliConfig(ColPali model) - colqwen2 —
ColQwen2Config(ColQwen2 model) - conditional_detr — ConditionalDetrConfig (Conditional DETR model)
- convbert — ConvBertConfig (ConvBERT model)
- convnext — ConvNextConfig (ConvNeXT model)
- convnextv2 — ConvNextV2Config (ConvNeXTV2 model)
- cpmant — CpmAntConfig (CPM-Ant model)
- csm —
CsmConfig(CSM model) - ctrl — CTRLConfig (CTRL model)
- cvt — CvtConfig (CvT model)
- d_fine —
DFineConfig(D-FINE model) - dab-detr —
DabDetrConfig(DAB-DETR model) - dac —
DacConfig(DAC model) - data2vec-audio — Data2VecAudioConfig (Data2VecAudio model)
- data2vec-text — Data2VecTextConfig (Data2VecText model)
- data2vec-vision — Data2VecVisionConfig (Data2VecVision model)
- dbrx —
DbrxConfig(DBRX model) - deberta — DebertaConfig (DeBERTa model)
- deberta-v2 — DebertaV2Config (DeBERTa-v2 model)
- decision_transformer — DecisionTransformerConfig (Decision Transformer model)
- deepseek_v2 —
DeepseekV2Config(DeepSeek-V2 model) - deepseek_v3 —
DeepseekV3Config(DeepSeek-V3 model) - deepseek_vl —
DeepseekVLConfig(DeepseekVL model) - deepseek_vl_hybrid —
DeepseekVLHybridConfig(DeepseekVLHybrid model) - deformable_detr — DeformableDetrConfig (Deformable DETR model)
- deit — DeiTConfig (DeiT model)
- depth_anything —
DepthAnythingConfig(Depth Anything model) - depth_pro —
DepthProConfig(DepthPro model) - deta — DetaConfig (DETA model)
- detr — DetrConfig (DETR model)
- dia —
DiaConfig(Dia model) - diffllama —
DiffLlamaConfig(DiffLlama model) - dinat — DinatConfig (DiNAT model)
- dinov2 —
Dinov2Config(DINOv2 model) - dinov2_with_registers —
Dinov2WithRegistersConfig(DINOv2 with Registers model) - dinov3_convnext —
DINOv3ConvNextConfig(DINOv3 ConvNext model) - dinov3_vit —
DINOv3ViTConfig(DINOv3 ViT model) - distilbert —
DistilBertConfig(DistilBERT model) - doge —
DogeConfig(Doge model) - donut-swin —
DonutSwinConfig(DonutSwin model) - dots1 —
Dots1Config(dots1 model) - dpr —
DPRConfig(DPR model) - dpt —
DPTConfig(DPT model) - edgetam —
EdgeTamConfig(EdgeTAM model) - edgetam_video —
EdgeTamVideoConfig(EdgeTamVideo model) - edgetam_vision_model —
EdgeTamVisionConfig(EdgeTamVisionModel model) - efficientformer —
EfficientFormerConfig(EfficientFormer model) - efficientloftr —
EfficientLoFTRConfig(EfficientLoFTR model) - efficientnet —
EfficientNetConfig(EfficientNet model) - electra —
ElectraConfig(ELECTRA model) - emu3 —
Emu3Config(Emu3 model) - encodec —
EncodecConfig(EnCodec model) - encoder-decoder —
EncoderDecoderConfig(Encoder decoder model) - eomt —
EomtConfig(EoMT model) - ernie —
ErnieConfig(ERNIE model) - ernie4_5 —
Ernie4_5Config(Ernie4_5 model) - ernie4_5_moe —
Ernie4_5_MoeConfig(Ernie4_5_MoE model) - ernie_m —
ErnieMConfig(ErnieM model) - esm —
EsmConfig(ESM model) - evolla —
EvollaConfig(Evolla model) - exaone4 —
Exaone4Config(EXAONE-4.0 model) - falcon —
FalconConfig(Falcon model) - falcon_h1 —
FalconH1Config(FalconH1 model) - falcon_mamba —
FalconMambaConfig(FalconMamba model) - fastspeech2_conformer —
FastSpeech2ConformerConfig(FastSpeech2Conformer model) - fastspeech2_conformer_with_hifigan —
FastSpeech2ConformerWithHifiGanConfig(FastSpeech2ConformerWithHifiGan model) - flaubert —
FlaubertConfig(FlauBERT model) - flava —
FlavaConfig(FLAVA model) - flex_olmo —
FlexOlmoConfig(FlexOlmo model) - florence2 —
Florence2Config(Florence2 model) - fnet —
FNetConfig(FNet model) - focalnet —
FocalNetConfig(FocalNet model) - fsmt —
FSMTConfig(FairSeq Machine-Translation model) - funnel —
FunnelConfig(Funnel Transformer model) - fuyu —
FuyuConfig(Fuyu model) - gemma —
GemmaConfig(Gemma model) - gemma2 —
Gemma2Config(Gemma2 model) - gemma3 —
Gemma3Config(Gemma3ForConditionalGeneration model) - gemma3_text —
Gemma3TextConfig(Gemma3ForCausalLM model) - gemma3n —
Gemma3nConfig(Gemma3nForConditionalGeneration model) - gemma3n_audio —
Gemma3nAudioConfig(Gemma3nAudioEncoder model) - gemma3n_text —
Gemma3nTextConfig(Gemma3nForCausalLM model) - gemma3n_vision —
Gemma3nVisionConfig(TimmWrapperModel model) - git —
GitConfig(GIT model) - glm —
GlmConfig(GLM model) - glm4 —
Glm4Config(GLM4 model) - glm4_moe —
Glm4MoeConfig(Glm4MoE model) - glm4v —
Glm4vConfig(GLM4V model) - glm4v_moe —
Glm4vMoeConfig(GLM4VMOE model) - glm4v_moe_text —
Glm4vMoeTextConfig(GLM4VMOE model) - glm4v_text —
Glm4vTextConfig(GLM4V model) - glpn —
GLPNConfig(GLPN model) - got_ocr2 —
GotOcr2Config(GOT-OCR2 model) - gpt-sw3 —
GPT2Config(GPT-Sw3 model) - gpt2 —
GPT2Config(OpenAI GPT-2 model) - gpt_bigcode —
GPTBigCodeConfig(GPTBigCode model) - gpt_neo —
GPTNeoConfig(GPT Neo model) - gpt_neox —
GPTNeoXConfig(GPT NeoX model) - gpt_neox_japanese —
GPTNeoXJapaneseConfig(GPT NeoX Japanese model) - gpt_oss —
GptOssConfig(GptOss model) - gptj —
GPTJConfig(GPT-J model) - gptsan-japanese —
GPTSanJapaneseConfig(GPTSAN-japanese model) - granite —
GraniteConfig(Granite model) - granite_speech —
GraniteSpeechConfig(GraniteSpeech model) - granitemoe —
GraniteMoeConfig(GraniteMoeMoe model) - granitemoehybrid —
GraniteMoeHybridConfig(GraniteMoeHybrid model) - granitemoeshared —
GraniteMoeSharedConfig(GraniteMoeSharedMoe model) - granitevision —
LlavaNextConfig(LLaVA-NeXT model) - graphormer —
GraphormerConfig(Graphormer model) - grounding-dino —
GroundingDinoConfig(Grounding DINO model) - groupvit —
GroupViTConfig(GroupViT model) - helium —
HeliumConfig(Helium model) - hgnet_v2 —
HGNetV2Config(HGNet-V2 model) - hiera —
HieraConfig(Hiera model) - hubert —
HubertConfig(Hubert model) - hunyuan_v1_dense —
HunYuanDenseV1Config(HunYuanDenseV1 model) - hunyuan_v1_moe —
HunYuanMoEV1Config(HunYuanMoeV1 model) - ibert —
IBertConfig(I-BERT model) - idefics —
IdeficsConfig(IDEFICS model) - idefics2 —
Idefics2Config(Idefics2 model) - idefics3 —
Idefics3Config(Idefics3 model) - idefics3_vision —
Idefics3VisionConfig(Idefics3VisionTransformer model) - ijepa —
IJepaConfig(I-JEPA model) - imagegpt —
ImageGPTConfig(ImageGPT model) - informer —
InformerConfig(Informer model) - instructblip —
InstructBlipConfig(InstructBLIP model) - instructblipvideo —
InstructBlipVideoConfig(InstructBlipVideo model) - internvl —
InternVLConfig(InternVL model) - internvl_vision —
InternVLVisionConfig(InternVLVision model) - jamba —
JambaConfig(Jamba model) - janus —
JanusConfig(Janus model) - jetmoe —
JetMoeConfig(JetMoe model) - jukebox —
JukeboxConfig(Jukebox model) - kosmos-2 —
Kosmos2Config(KOSMOS-2 model) - kosmos-2.5 —
Kosmos2_5Config(KOSMOS-2.5 model) - kyutai_speech_to_text —
KyutaiSpeechToTextConfig(KyutaiSpeechToText model) - layoutlm —
LayoutLMConfig(LayoutLM model) - layoutlmv2 —
LayoutLMv2Config(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3Config(LayoutLMv3 model) - led —
LEDConfig(LED model) - levit —
LevitConfig(LeViT model) - lfm2 —
Lfm2Config(Lfm2 model) - lfm2_vl —
Lfm2VlConfig(Lfm2Vl model) - lightglue —
LightGlueConfig(LightGlue model) - lilt —
LiltConfig(LiLT model) - llama —
LlamaConfig(LLaMA model) - llama4 —
Llama4Config(Llama4 model) - llama4_text —
Llama4TextConfig(Llama4ForCausalLM model) - llava —
LlavaConfig(LLaVa model) - llava_next —
LlavaNextConfig(LLaVA-NeXT model) - llava_next_video —
LlavaNextVideoConfig(LLaVa-NeXT-Video model) - llava_onevision —
LlavaOnevisionConfig(LLaVA-Onevision model) - longcat_flash —
LongcatFlashConfig(LongCatFlash model) - longformer —
LongformerConfig(Longformer model) - longt5 —
LongT5Config(LongT5 model) - luke —
LukeConfig(LUKE model) - lxmert —
LxmertConfig(LXMERT model) - m2m_100 —
M2M100Config(M2M100 model) - mamba —
MambaConfig(Mamba model) - mamba2 —
Mamba2Config(mamba2 model) - marian —
MarianConfig(Marian model) - markuplm —
MarkupLMConfig(MarkupLM model) - mask2former —
Mask2FormerConfig(Mask2Former model) - maskformer —
MaskFormerConfig(MaskFormer model) - maskformer-swin —
MaskFormerSwinConfig(MaskFormerSwin model) - mbart —
MBartConfig(mBART model) - mctct —
MCTCTConfig(M-CTC-T model) - mega —
MegaConfig(MEGA model) - megatron-bert —
MegatronBertConfig(Megatron-BERT model) - metaclip_2 —
MetaClip2Config(MetaCLIP 2 model) - mgp-str —
MgpstrConfig(MGP-STR model) - mimi —
MimiConfig(Mimi model) - minimax —
MiniMaxConfig(MiniMax model) - ministral —
MinistralConfig(Ministral model) - mistral —
MistralConfig(Mistral model) - mistral3 —
Mistral3Config(Mistral3 model) - mixtral —
MixtralConfig(Mixtral model) - mlcd —
MLCDVisionConfig(MLCD model) - mllama —
MllamaConfig(Mllama model) - mm-grounding-dino —
MMGroundingDinoConfig(MM Grounding DINO model) - mobilebert —
MobileBertConfig(MobileBERT model) - mobilenet_v1 —
MobileNetV1Config(MobileNetV1 model) - mobilenet_v2 —
MobileNetV2Config(MobileNetV2 model) - mobilevit —
MobileViTConfig(MobileViT model) - mobilevitv2 —
MobileViTV2Config(MobileViTV2 model) - modernbert —
ModernBertConfig(ModernBERT model) - modernbert-decoder —
ModernBertDecoderConfig(ModernBertDecoder model) - moonshine —
MoonshineConfig(Moonshine model) - moshi —
MoshiConfig(Moshi model) - mpnet —
MPNetConfig(MPNet model) - mpt —
MptConfig(MPT model) - mra —
MraConfig(MRA model) - mt5 —
MT5Config(MT5 model) - musicgen —
MusicgenConfig(MusicGen model) - musicgen_melody —
MusicgenMelodyConfig(MusicGen Melody model) - mvp —
MvpConfig(MVP model) - nat —
NatConfig(NAT model) - nemotron —
NemotronConfig(Nemotron model) - nezha —
NezhaConfig(Nezha model) - nllb-moe —
NllbMoeConfig(NLLB-MOE model) - nougat —
VisionEncoderDecoderConfig(Nougat model) - nystromformer —
NystromformerConfig(Nyströmformer model) - olmo —
OlmoConfig(OLMo model) - olmo2 —
Olmo2Config(OLMo2 model) - olmo3 —
Olmo3Config(Olmo3 model) - olmoe —
OlmoeConfig(OLMoE model) - omdet-turbo —
OmDetTurboConfig(OmDet-Turbo model) - oneformer —
OneFormerConfig(OneFormer model) - open-llama —
OpenLlamaConfig(OpenLlama model) - openai-gpt —
OpenAIGPTConfig(OpenAI GPT model) - opt —
OPTConfig(OPT model) - ovis2 —
Ovis2Config(Ovis2 model) - owlv2 —
Owlv2Config(OWLv2 model) - owlvit —
OwlViTConfig(OWL-ViT model) - paligemma —
PaliGemmaConfig(PaliGemma model) - parakeet_ctc —
ParakeetCTCConfig(Parakeet model) - parakeet_encoder —
ParakeetEncoderConfig(ParakeetEncoder model) - patchtsmixer —
PatchTSMixerConfig(PatchTSMixer model) - patchtst —
PatchTSTConfig(PatchTST model) - pegasus —
PegasusConfig(Pegasus model) - pegasus_x —
PegasusXConfig(PEGASUS-X model) - perceiver —
PerceiverConfig(Perceiver model) - perception_encoder —
TimmWrapperConfig(PerceptionEncoder model) - perception_lm —
PerceptionLMConfig(PerceptionLM model) - persimmon —
PersimmonConfig(Persimmon model) - phi —
PhiConfig(Phi model) - phi3 —
Phi3Config(Phi3 model) - phi4_multimodal —
Phi4MultimodalConfig(Phi4Multimodal model) - phimoe —
PhimoeConfig(Phimoe model) - pix2struct —
Pix2StructConfig(Pix2Struct model) - pixtral —
PixtralVisionConfig(Pixtral model) - plbart —
PLBartConfig(PLBart model) - poolformer —
PoolFormerConfig(PoolFormer model) - pop2piano —
Pop2PianoConfig(Pop2Piano model) - prompt_depth_anything —
PromptDepthAnythingConfig(PromptDepthAnything model) - prophetnet —
ProphetNetConfig(ProphetNet model) - pvt —
PvtConfig(PVT model) - pvt_v2 —
PvtV2Config(PVTv2 model) - qdqbert —
QDQBertConfig(QDQBert model) - qwen2 —
Qwen2Config(Qwen2 model) - qwen2_5_omni —
Qwen2_5OmniConfig(Qwen2_5Omni model) - qwen2_5_vl —
Qwen2_5_VLConfig(Qwen2_5_VL model) - qwen2_5_vl_text —
Qwen2_5_VLTextConfig(Qwen2_5_VL model) - qwen2_audio —
Qwen2AudioConfig(Qwen2Audio model) - qwen2_audio_encoder —
Qwen2AudioEncoderConfig(Qwen2AudioEncoder model) - qwen2_moe —
Qwen2MoeConfig(Qwen2MoE model) - qwen2_vl —
Qwen2VLConfig(Qwen2VL model) - qwen2_vl_text —
Qwen2VLTextConfig(Qwen2VL model) - qwen3 —
Qwen3Config(Qwen3 model) - qwen3_moe —
Qwen3MoeConfig(Qwen3MoE model) - qwen3_next —
Qwen3NextConfig(Qwen3Next model) - qwen3_omni_moe —
Qwen3OmniMoeConfig(Qwen3OmniMoE model) - qwen3_vl —
Qwen3VLConfig(Qwen3VL model) - qwen3_vl_moe —
Qwen3VLMoeConfig(Qwen3VLMoe model) - qwen3_vl_moe_text —
Qwen3VLMoeTextConfig(Qwen3VLMoe model) - qwen3_vl_text —
Qwen3VLTextConfig(Qwen3VL model) - rag —
RagConfig(RAG model) - realm —
RealmConfig(REALM model) - recurrent_gemma —
RecurrentGemmaConfig(RecurrentGemma model) - reformer —
ReformerConfig(Reformer model) - regnet —
RegNetConfig(RegNet model) - rembert —
RemBertConfig(RemBERT model) - resnet —
ResNetConfig(ResNet model) - retribert —
RetriBertConfig(RetriBERT model) - roberta —
RobertaConfig(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormConfig(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertConfig(RoCBert model) - roformer —
RoFormerConfig(RoFormer model) - rt_detr —
RTDetrConfig(RT-DETR model) - rt_detr_resnet —
RTDetrResNetConfig(RT-DETR-ResNet model) - rt_detr_v2 —
RTDetrV2Config(RT-DETRv2 model) - rwkv —
RwkvConfig(RWKV model) - sam —
SamConfig(SAM model) - sam2 —
Sam2Config(SAM2 model) - sam2_hiera_det_model —
Sam2HieraDetConfig(Sam2HieraDetModel model) - sam2_video —
Sam2VideoConfig(Sam2VideoModel model) - sam2_vision_model —
Sam2VisionConfig(Sam2VisionModel model) - sam_hq —
SamHQConfig(SAM-HQ model) - sam_hq_vision_model —
SamHQVisionConfig(SamHQVisionModel model) - sam_vision_model —
SamVisionConfig(SamVisionModel model) - seamless_m4t —
SeamlessM4TConfig(SeamlessM4T model) - seamless_m4t_v2 —
SeamlessM4Tv2Config(SeamlessM4Tv2 model) - seed_oss —
SeedOssConfig(SeedOss model) - segformer —
SegformerConfig(SegFormer model) - seggpt —
SegGptConfig(SegGPT model) - sew —
SEWConfig(SEW model) - sew-d —
SEWDConfig(SEW-D model) - shieldgemma2 —
ShieldGemma2Config(Shieldgemma2 model) - siglip —
SiglipConfig(SigLIP model) - siglip2 —
Siglip2Config(SigLIP2 model) - siglip2_vision_model —
Siglip2VisionConfig(Siglip2VisionModel model) - siglip_vision_model —
SiglipVisionConfig(SiglipVisionModel model) - smollm3 —
SmolLM3Config(SmolLM3 model) - smolvlm —
SmolVLMConfig(SmolVLM model) - smolvlm_vision —
SmolVLMVisionConfig(SmolVLMVisionTransformer model) - speech-encoder-decoder —
SpeechEncoderDecoderConfig(Speech Encoder decoder model) - speech_to_text —
Speech2TextConfig(Speech2Text model) - speech_to_text_2 —
Speech2Text2Config(Speech2Text2 model) - speecht5 —
SpeechT5Config(SpeechT5 model) - splinter —
SplinterConfig(Splinter model) - squeezebert —
SqueezeBertConfig(SqueezeBERT model) - stablelm —
StableLmConfig(StableLm model) - starcoder2 —
Starcoder2Config(Starcoder2 model) - superglue —
SuperGlueConfig(SuperGlue model) - superpoint —
SuperPointConfig(SuperPoint model) - swiftformer —
SwiftFormerConfig(SwiftFormer model) - swin —
SwinConfig(Swin Transformer model) - swin2sr —
Swin2SRConfig(Swin2SR model) - swinv2 —
Swinv2Config(Swin Transformer V2 model) - switch_transformers —
SwitchTransformersConfig(SwitchTransformers model) - t5 —
T5Config(T5 model) - t5gemma —
T5GemmaConfig(T5Gemma model) - table-transformer —
TableTransformerConfig(Table Transformer model) - tapas —
TapasConfig(TAPAS model) - textnet —
TextNetConfig(TextNet model) - time_series_transformer —
TimeSeriesTransformerConfig(Time Series Transformer model) - timesfm —
TimesFmConfig(TimesFm model) - timesformer —
TimesformerConfig(TimeSformer model) - timm_backbone —
TimmBackboneConfig(TimmBackbone model) - timm_wrapper —
TimmWrapperConfig(TimmWrapperModel model) - trajectory_transformer —
TrajectoryTransformerConfig(Trajectory Transformer model) - transfo-xl —
TransfoXLConfig(Transformer-XL model) - trocr —
TrOCRConfig(TrOCR model) - tvlt —
TvltConfig(TVLT model) - tvp —
TvpConfig(TVP model) - udop —
UdopConfig(UDOP model) - umt5 —
UMT5Config(UMT5 model) - unispeech —
UniSpeechConfig(UniSpeech model) - unispeech-sat —
UniSpeechSatConfig(UniSpeechSat model) - univnet —
UnivNetConfig(UnivNet model) - upernet —
UperNetConfig(UPerNet model) - van —
VanConfig(VAN model) - vaultgemma —
VaultGemmaConfig(VaultGemma model) - video_llava —
VideoLlavaConfig(VideoLlava model) - videomae —
VideoMAEConfig(VideoMAE model) - vilt —
ViltConfig(ViLT model) - vipllava —
VipLlavaConfig(VipLlava model) - vision-encoder-decoder —
VisionEncoderDecoderConfig(Vision Encoder decoder model) - vision-text-dual-encoder —
VisionTextDualEncoderConfig(VisionTextDualEncoder model) - visual_bert —
VisualBertConfig(VisualBERT model) - vit —
ViTConfig(ViT model) - vit_hybrid —
ViTHybridConfig(ViT Hybrid model) - vit_mae —
ViTMAEConfig(ViTMAE model) - vit_msn —
ViTMSNConfig(ViTMSN model) - vitdet —
VitDetConfig(VitDet model) - vitmatte —
VitMatteConfig(ViTMatte model) - vitpose —
VitPoseConfig(ViTPose model) - vitpose_backbone —
VitPoseBackboneConfig(ViTPoseBackbone model) - vits —
VitsConfig(VITS model) - vivit —
VivitConfig(ViViT model) - vjepa2 —
VJEPA2Config(VJEPA2Model model) - voxtral —
VoxtralConfig(Voxtral model) - voxtral_encoder —
VoxtralEncoderConfig(Voxtral Encoder model) - wav2vec2 —
Wav2Vec2Config(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2BertConfig(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2ConformerConfig(Wav2Vec2-Conformer model) - wavlm —
WavLMConfig(WavLM model) - whisper —
WhisperConfig(Whisper model) - xclip —
XCLIPConfig(X-CLIP model) - xcodec —
XcodecConfig(X-CODEC model) - xglm —
XGLMConfig(XGLM model) - xlm —
XLMConfig(XLM model) - xlm-prophetnet —
XLMProphetNetConfig(XLM-ProphetNet model) - xlm-roberta —
XLMRobertaConfig(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLConfig(XLM-RoBERTa-XL model) - xlnet —
XLNetConfig(XLNet model) - xlstm —
xLSTMConfig(xLSTM model) - xmod —
XmodConfig(X-MOD model) - yolos —
YolosConfig(YOLOS model) - yoso —
YosoConfig(YOSO model) - zamba —
ZambaConfig(Zamba model) - zamba2 —
Zamba2Config(Zamba2 model) - zoedepth —
ZoeDepthConfig(ZoeDepth model)
Examples:
>>> from transformers import AutoConfig
>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")
>>> # Download configuration from huggingface.co (user-uploaded) and cache.
>>> config = AutoConfig.from_pretrained("dbmdz/bert-base-german-cased")
>>> # If configuration file is in a directory (e.g., was saved using *save_pretrained('./test/saved_model/')*).
>>> config = AutoConfig.from_pretrained("./test/bert_saved_model/")
>>> # Load a specific configuration file.
>>> config = AutoConfig.from_pretrained("./test/bert_saved_model/my_configuration.json")
>>> # Change some config attributes when loading a pretrained config.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased", output_attentions=True, foo=False)
>>> config.output_attentions
True
>>> config, unused_kwargs = AutoConfig.from_pretrained(
... "google-bert/bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
... )
>>> config.output_attentions
True
>>> unused_kwargs
{'foo': False}register
< source >( model_type config exist_ok = False )
Parameters
- model_type (
str) — The model type like “bert” or “gpt”. - config (PretrainedConfig) — The config to register.
Register a new configuration for this class.
AutoTokenizer
This is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when created with the AutoTokenizer.from_pretrained() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_pretrained
< source >( pretrained_model_name_or_path *inputs **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co.
- A path to a directory containing vocabulary files required by the tokenizer, for instance saved
using the save_pretrained() method, e.g.,
./my_model_directory/. - A path or url to a single saved vocabulary file if and only if the tokenizer only requires a
single vocabulary file (like Bert or XLNet), e.g.:
./my_model_directory/vocab.txt. (Not applicable to all derived classes)
- inputs (additional positional arguments, optional) —
Will be passed along to the Tokenizer
__init__()method. - config (PretrainedConfig, optional) — The configuration object used to determine the tokenizer class to instantiate.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download the model weights and configuration files and override the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - subfolder (
str, optional) — In case the relevant files are located inside a subfolder of the model repo on huggingface.co (e.g. for facebook/rag-token-base), specify it here. - use_fast (
bool, optional, defaults toTrue) — Use a fast Rust-based tokenizer if it is supported for a given model. If a fast tokenizer is not available for a given model, a normal Python-based tokenizer is returned instead. - tokenizer_type (
str, optional) — Tokenizer type to be loaded. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs (additional keyword arguments, optional) —
Will be passed to the Tokenizer
__init__()method. Can be used to set special tokens likebos_token,eos_token,unk_token,sep_token,pad_token,cls_token,mask_token,additional_special_tokens. See parameters in the__init__()for more details.
Instantiate one of the tokenizer classes of the library from a pretrained model vocabulary.
The tokenizer class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- aimv2 — CLIPTokenizer or CLIPTokenizerFast (AIMv2 model)
- albert — AlbertTokenizer or AlbertTokenizerFast (ALBERT model)
- align — BertTokenizer or BertTokenizerFast (ALIGN model)
- arcee —
LlamaTokenizerorLlamaTokenizerFast(Arcee model) - aria —
LlamaTokenizerorLlamaTokenizerFast(Aria model) - aya_vision —
CohereTokenizerFast(AyaVision model) - bark — BertTokenizer or BertTokenizerFast (Bark model)
- bart — BartTokenizer or BartTokenizerFast (BART model)
- barthez — BarthezTokenizer or BarthezTokenizerFast (BARThez model)
- bartpho — BartphoTokenizer (BARTpho model)
- bert — BertTokenizer or BertTokenizerFast (BERT model)
- bert-generation — BertGenerationTokenizer (Bert Generation model)
- bert-japanese — BertJapaneseTokenizer (BertJapanese model)
- bertweet — BertweetTokenizer (BERTweet model)
- big_bird — BigBirdTokenizer or BigBirdTokenizerFast (BigBird model)
- bigbird_pegasus —
PegasusTokenizerorPegasusTokenizerFast(BigBird-Pegasus model) - biogpt — BioGptTokenizer (BioGpt model)
- bitnet — PreTrainedTokenizerFast (BitNet model)
- blenderbot — BlenderbotTokenizer or BlenderbotTokenizerFast (Blenderbot model)
- blenderbot-small — BlenderbotSmallTokenizer (BlenderbotSmall model)
- blip — BertTokenizer or BertTokenizerFast (BLIP model)
- blip-2 —
GPT2TokenizerorGPT2TokenizerFast(BLIP-2 model) - bloom — BloomTokenizerFast (BLOOM model)
- blt — PreTrainedTokenizerFast (Blt model)
- bridgetower —
RobertaTokenizerorRobertaTokenizerFast(BridgeTower model) - bros — BertTokenizer or BertTokenizerFast (BROS model)
- byt5 — ByT5Tokenizer (ByT5 model)
- camembert — CamembertTokenizer or CamembertTokenizerFast (CamemBERT model)
- canine — CanineTokenizer (CANINE model)
- chameleon —
LlamaTokenizerorLlamaTokenizerFast(Chameleon model) - chinese_clip — BertTokenizer or BertTokenizerFast (Chinese-CLIP model)
- clap —
RobertaTokenizerorRobertaTokenizerFast(CLAP model) - clip — CLIPTokenizer or CLIPTokenizerFast (CLIP model)
- clipseg — CLIPTokenizer or CLIPTokenizerFast (CLIPSeg model)
- clvp — ClvpTokenizer (CLVP model)
- code_llama — CodeLlamaTokenizer or CodeLlamaTokenizerFast (CodeLlama model)
- codegen — CodeGenTokenizer or CodeGenTokenizerFast (CodeGen model)
- cohere —
CohereTokenizerFast(Cohere model) - cohere2 —
CohereTokenizerFast(Cohere2 model) - colpali —
LlamaTokenizerorLlamaTokenizerFast(ColPali model) - colqwen2 —
Qwen2TokenizerorQwen2TokenizerFast(ColQwen2 model) - convbert — ConvBertTokenizer or ConvBertTokenizerFast (ConvBERT model)
- cpm — CpmTokenizer or CpmTokenizerFast (CPM model)
- cpmant — CpmAntTokenizer (CPM-Ant model)
- csm — PreTrainedTokenizerFast (CSM model)
- ctrl — CTRLTokenizer (CTRL model)
- data2vec-audio —
Wav2Vec2CTCTokenizer(Data2VecAudio model) - data2vec-text —
RobertaTokenizerorRobertaTokenizerFast(Data2VecText model) - dbrx —
GPT2TokenizerorGPT2TokenizerFast(DBRX model) - deberta — DebertaTokenizer or DebertaTokenizerFast (DeBERTa model)
- deberta-v2 — DebertaV2Tokenizer or DebertaV2TokenizerFast (DeBERTa-v2 model)
- deepseek_v2 —
LlamaTokenizerorLlamaTokenizerFast(DeepSeek-V2 model) - deepseek_v3 —
LlamaTokenizerorLlamaTokenizerFast(DeepSeek-V3 model) - deepseek_vl —
LlamaTokenizerorLlamaTokenizerFast(DeepseekVL model) - deepseek_vl_hybrid —
LlamaTokenizerorLlamaTokenizerFast(DeepseekVLHybrid model) - dia —
DiaTokenizer(Dia model) - diffllama —
LlamaTokenizerorLlamaTokenizerFast(DiffLlama model) - distilbert —
DistilBertTokenizerorDistilBertTokenizerFast(DistilBERT model) - dpr —
DPRQuestionEncoderTokenizerorDPRQuestionEncoderTokenizerFast(DPR model) - electra —
ElectraTokenizerorElectraTokenizerFast(ELECTRA model) - emu3 —
GPT2TokenizerorGPT2TokenizerFast(Emu3 model) - ernie — BertTokenizer or BertTokenizerFast (ERNIE model)
- ernie4_5 —
LlamaTokenizerFast(Ernie4_5 model) - ernie4_5_moe —
LlamaTokenizerFast(Ernie4_5_MoE model) - ernie_m —
ErnieMTokenizer(ErnieM model) - esm —
EsmTokenizer(ESM model) - exaone4 —
GPT2TokenizerorGPT2TokenizerFast(EXAONE-4.0 model) - falcon — PreTrainedTokenizerFast (Falcon model)
- falcon_mamba —
GPTNeoXTokenizerFast(FalconMamba model) - fastspeech2_conformer — (FastSpeech2Conformer model)
- flaubert —
FlaubertTokenizer(FlauBERT model) - flex_olmo —
GPT2TokenizerFast(FlexOlmo model) - fnet —
FNetTokenizerorFNetTokenizerFast(FNet model) - fsmt —
FSMTTokenizer(FairSeq Machine-Translation model) - funnel —
FunnelTokenizerorFunnelTokenizerFast(Funnel Transformer model) - gemma —
GemmaTokenizerorGemmaTokenizerFast(Gemma model) - gemma2 —
GemmaTokenizerorGemmaTokenizerFast(Gemma2 model) - gemma3 —
GemmaTokenizerorGemmaTokenizerFast(Gemma3ForConditionalGeneration model) - gemma3_text —
GemmaTokenizerorGemmaTokenizerFast(Gemma3ForCausalLM model) - gemma3n —
GemmaTokenizerorGemmaTokenizerFast(Gemma3nForConditionalGeneration model) - gemma3n_text —
GemmaTokenizerorGemmaTokenizerFast(Gemma3nForCausalLM model) - git — BertTokenizer or BertTokenizerFast (GIT model)
- glm — PreTrainedTokenizerFast (GLM model)
- glm4 — PreTrainedTokenizerFast (GLM4 model)
- glm4_moe — PreTrainedTokenizerFast (Glm4MoE model)
- glm4v — PreTrainedTokenizerFast (GLM4V model)
- glm4v_moe — PreTrainedTokenizerFast (GLM4VMOE model)
- gpt-sw3 —
GPTSw3Tokenizer(GPT-Sw3 model) - gpt2 —
GPT2TokenizerorGPT2TokenizerFast(OpenAI GPT-2 model) - gpt_bigcode —
GPT2TokenizerorGPT2TokenizerFast(GPTBigCode model) - gpt_neo —
GPT2TokenizerorGPT2TokenizerFast(GPT Neo model) - gpt_neox —
GPTNeoXTokenizerFast(GPT NeoX model) - gpt_neox_japanese —
GPTNeoXJapaneseTokenizer(GPT NeoX Japanese model) - gpt_oss — PreTrainedTokenizerFast (GptOss model)
- gptj —
GPT2TokenizerorGPT2TokenizerFast(GPT-J model) - gptsan-japanese —
GPTSanJapaneseTokenizer(GPTSAN-japanese model) - granite —
GPT2Tokenizer(Granite model) - granitemoe —
GPT2Tokenizer(GraniteMoeMoe model) - granitemoehybrid —
GPT2Tokenizer(GraniteMoeHybrid model) - granitemoeshared —
GPT2Tokenizer(GraniteMoeSharedMoe model) - grounding-dino — BertTokenizer or BertTokenizerFast (Grounding DINO model)
- groupvit — CLIPTokenizer or CLIPTokenizerFast (GroupViT model)
- helium — PreTrainedTokenizerFast (Helium model)
- herbert —
HerbertTokenizerorHerbertTokenizerFast(HerBERT model) - hubert —
Wav2Vec2CTCTokenizer(Hubert model) - ibert —
RobertaTokenizerorRobertaTokenizerFast(I-BERT model) - idefics —
LlamaTokenizerFast(IDEFICS model) - idefics2 —
LlamaTokenizerorLlamaTokenizerFast(Idefics2 model) - idefics3 —
LlamaTokenizerorLlamaTokenizerFast(Idefics3 model) - instructblip —
GPT2TokenizerorGPT2TokenizerFast(InstructBLIP model) - instructblipvideo —
GPT2TokenizerorGPT2TokenizerFast(InstructBlipVideo model) - internvl —
Qwen2TokenizerorQwen2TokenizerFast(InternVL model) - jamba —
LlamaTokenizerorLlamaTokenizerFast(Jamba model) - janus —
LlamaTokenizerFast(Janus model) - jetmoe —
LlamaTokenizerorLlamaTokenizerFast(JetMoe model) - jukebox —
JukeboxTokenizer(Jukebox model) - kosmos-2 —
XLMRobertaTokenizerorXLMRobertaTokenizerFast(KOSMOS-2 model) - kosmos-2.5 — PreTrainedTokenizerFast (KOSMOS-2.5 model)
- layoutlm —
LayoutLMTokenizerorLayoutLMTokenizerFast(LayoutLM model) - layoutlmv2 —
LayoutLMv2TokenizerorLayoutLMv2TokenizerFast(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3TokenizerorLayoutLMv3TokenizerFast(LayoutLMv3 model) - layoutxlm —
LayoutXLMTokenizerorLayoutXLMTokenizerFast(LayoutXLM model) - led —
LEDTokenizerorLEDTokenizerFast(LED model) - lilt —
LayoutLMv3TokenizerorLayoutLMv3TokenizerFast(LiLT model) - llama —
LlamaTokenizerorLlamaTokenizerFast(LLaMA model) - llama4 —
LlamaTokenizerorLlamaTokenizerFast(Llama4 model) - llama4_text —
LlamaTokenizerorLlamaTokenizerFast(Llama4ForCausalLM model) - llava —
LlamaTokenizerorLlamaTokenizerFast(LLaVa model) - llava_next —
LlamaTokenizerorLlamaTokenizerFast(LLaVA-NeXT model) - llava_next_video —
LlamaTokenizerorLlamaTokenizerFast(LLaVa-NeXT-Video model) - llava_onevision —
LlamaTokenizerorLlamaTokenizerFast(LLaVA-Onevision model) - longformer —
LongformerTokenizerorLongformerTokenizerFast(Longformer model) - longt5 —
T5TokenizerorT5TokenizerFast(LongT5 model) - luke —
LukeTokenizer(LUKE model) - lxmert —
LxmertTokenizerorLxmertTokenizerFast(LXMERT model) - m2m_100 —
M2M100Tokenizer(M2M100 model) - mamba —
GPTNeoXTokenizerFast(Mamba model) - mamba2 —
GPTNeoXTokenizerFast(mamba2 model) - marian —
MarianTokenizer(Marian model) - mbart —
MBartTokenizerorMBartTokenizerFast(mBART model) - mbart50 —
MBart50TokenizerorMBart50TokenizerFast(mBART-50 model) - mega —
RobertaTokenizerorRobertaTokenizerFast(MEGA model) - megatron-bert — BertTokenizer or BertTokenizerFast (Megatron-BERT model)
- metaclip_2 —
XLMRobertaTokenizerorXLMRobertaTokenizerFast(MetaCLIP 2 model) - mgp-str —
MgpstrTokenizer(MGP-STR model) - minimax —
GPT2TokenizerorGPT2TokenizerFast(MiniMax model) - mistral —
MistralCommonTokenizer(Mistral model) - mixtral —
MistralCommonTokenizer(Mixtral model) - mllama —
LlamaTokenizerorLlamaTokenizerFast(Mllama model) - mluke —
MLukeTokenizer(mLUKE model) - mm-grounding-dino — BertTokenizer or BertTokenizerFast (MM Grounding DINO model)
- mobilebert —
MobileBertTokenizerorMobileBertTokenizerFast(MobileBERT model) - modernbert — PreTrainedTokenizerFast (ModernBERT model)
- moonshine — PreTrainedTokenizerFast (Moonshine model)
- moshi — PreTrainedTokenizerFast (Moshi model)
- mpnet —
MPNetTokenizerorMPNetTokenizerFast(MPNet model) - mpt —
GPTNeoXTokenizerFast(MPT model) - mra —
RobertaTokenizerorRobertaTokenizerFast(MRA model) - mt5 —
MT5TokenizerorMT5TokenizerFast(MT5 model) - musicgen —
T5TokenizerorT5TokenizerFast(MusicGen model) - musicgen_melody —
T5TokenizerorT5TokenizerFast(MusicGen Melody model) - mvp —
MvpTokenizerorMvpTokenizerFast(MVP model) - myt5 —
MyT5Tokenizer(myt5 model) - nemotron — PreTrainedTokenizerFast (Nemotron model)
- nezha — BertTokenizer or BertTokenizerFast (Nezha model)
- nllb —
NllbTokenizerorNllbTokenizerFast(NLLB model) - nllb-moe —
NllbTokenizerorNllbTokenizerFast(NLLB-MOE model) - nystromformer — AlbertTokenizer or AlbertTokenizerFast (Nyströmformer model)
- olmo —
GPTNeoXTokenizerFast(OLMo model) - olmo2 —
GPTNeoXTokenizerFast(OLMo2 model) - olmo3 —
GPT2TokenizerFast(Olmo3 model) - olmoe —
GPTNeoXTokenizerFast(OLMoE model) - omdet-turbo — CLIPTokenizer or CLIPTokenizerFast (OmDet-Turbo model)
- oneformer — CLIPTokenizer or CLIPTokenizerFast (OneFormer model)
- openai-gpt —
OpenAIGPTTokenizerorOpenAIGPTTokenizerFast(OpenAI GPT model) - opt —
GPT2TokenizerorGPT2TokenizerFast(OPT model) - owlv2 — CLIPTokenizer or CLIPTokenizerFast (OWLv2 model)
- owlvit — CLIPTokenizer or CLIPTokenizerFast (OWL-ViT model)
- paligemma —
LlamaTokenizerorLlamaTokenizerFast(PaliGemma model) - parakeet —
ParakeetCTCTokenizer(Parakeet model) - pegasus —
PegasusTokenizerorPegasusTokenizerFast(Pegasus model) - pegasus_x —
PegasusTokenizerorPegasusTokenizerFast(PEGASUS-X model) - perceiver —
PerceiverTokenizer(Perceiver model) - persimmon —
LlamaTokenizerorLlamaTokenizerFast(Persimmon model) - phi — CodeGenTokenizer or CodeGenTokenizerFast (Phi model)
- phi3 —
LlamaTokenizerorLlamaTokenizerFast(Phi3 model) - phimoe —
LlamaTokenizerorLlamaTokenizerFast(Phimoe model) - phobert —
PhobertTokenizer(PhoBERT model) - pix2struct —
T5TokenizerorT5TokenizerFast(Pix2Struct model) - pixtral —
MistralCommonTokenizer(Pixtral model) - plbart —
PLBartTokenizer(PLBart model) - prophetnet —
ProphetNetTokenizer(ProphetNet model) - qdqbert — BertTokenizer or BertTokenizerFast (QDQBert model)
- qwen2 —
Qwen2TokenizerorQwen2TokenizerFast(Qwen2 model) - qwen2_5_omni —
Qwen2TokenizerorQwen2TokenizerFast(Qwen2_5Omni model) - qwen2_5_vl —
Qwen2TokenizerorQwen2TokenizerFast(Qwen2_5_VL model) - qwen2_audio —
Qwen2TokenizerorQwen2TokenizerFast(Qwen2Audio model) - qwen2_moe —
Qwen2TokenizerorQwen2TokenizerFast(Qwen2MoE model) - qwen2_vl —
Qwen2TokenizerorQwen2TokenizerFast(Qwen2VL model) - qwen3 —
Qwen2TokenizerorQwen2TokenizerFast(Qwen3 model) - qwen3_moe —
Qwen2TokenizerorQwen2TokenizerFast(Qwen3MoE model) - qwen3_next —
Qwen2TokenizerorQwen2TokenizerFast(Qwen3Next model) - qwen3_omni_moe —
Qwen2TokenizerorQwen2TokenizerFast(Qwen3OmniMoE model) - qwen3_vl —
Qwen2TokenizerorQwen2TokenizerFast(Qwen3VL model) - qwen3_vl_moe —
Qwen2TokenizerorQwen2TokenizerFast(Qwen3VLMoe model) - rag —
RagTokenizer(RAG model) - realm —
RealmTokenizerorRealmTokenizerFast(REALM model) - recurrent_gemma —
GemmaTokenizerorGemmaTokenizerFast(RecurrentGemma model) - reformer —
ReformerTokenizerorReformerTokenizerFast(Reformer model) - rembert —
RemBertTokenizerorRemBertTokenizerFast(RemBERT model) - retribert —
RetriBertTokenizerorRetriBertTokenizerFast(RetriBERT model) - roberta —
RobertaTokenizerorRobertaTokenizerFast(RoBERTa model) - roberta-prelayernorm —
RobertaTokenizerorRobertaTokenizerFast(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertTokenizer(RoCBert model) - roformer —
RoFormerTokenizerorRoFormerTokenizerFast(RoFormer model) - rwkv —
GPTNeoXTokenizerFast(RWKV model) - seamless_m4t —
SeamlessM4TTokenizerorSeamlessM4TTokenizerFast(SeamlessM4T model) - seamless_m4t_v2 —
SeamlessM4TTokenizerorSeamlessM4TTokenizerFast(SeamlessM4Tv2 model) - shieldgemma2 —
GemmaTokenizerorGemmaTokenizerFast(Shieldgemma2 model) - siglip —
SiglipTokenizer(SigLIP model) - siglip2 —
GemmaTokenizerorGemmaTokenizerFast(SigLIP2 model) - smollm3 — PreTrainedTokenizerFast (SmolLM3 model)
- speech_to_text —
Speech2TextTokenizer(Speech2Text model) - speech_to_text_2 —
Speech2Text2Tokenizer(Speech2Text2 model) - speecht5 —
SpeechT5Tokenizer(SpeechT5 model) - splinter —
SplinterTokenizerorSplinterTokenizerFast(Splinter model) - squeezebert —
SqueezeBertTokenizerorSqueezeBertTokenizerFast(SqueezeBERT model) - stablelm —
GPTNeoXTokenizerFast(StableLm model) - starcoder2 —
GPT2TokenizerorGPT2TokenizerFast(Starcoder2 model) - switch_transformers —
T5TokenizerorT5TokenizerFast(SwitchTransformers model) - t5 —
T5TokenizerorT5TokenizerFast(T5 model) - t5gemma —
GemmaTokenizerorGemmaTokenizerFast(T5Gemma model) - tapas —
TapasTokenizer(TAPAS model) - tapex —
TapexTokenizer(TAPEX model) - transfo-xl —
TransfoXLTokenizer(Transformer-XL model) - tvp — BertTokenizer or BertTokenizerFast (TVP model)
- udop —
UdopTokenizerorUdopTokenizerFast(UDOP model) - umt5 —
T5TokenizerorT5TokenizerFast(UMT5 model) - video_llava —
LlamaTokenizerorLlamaTokenizerFast(VideoLlava model) - vilt — BertTokenizer or BertTokenizerFast (ViLT model)
- vipllava —
LlamaTokenizerorLlamaTokenizerFast(VipLlava model) - visual_bert — BertTokenizer or BertTokenizerFast (VisualBERT model)
- vits —
VitsTokenizer(VITS model) - voxtral —
MistralCommonTokenizer(Voxtral model) - wav2vec2 —
Wav2Vec2CTCTokenizer(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2CTCTokenizer(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2CTCTokenizer(Wav2Vec2-Conformer model) - wav2vec2_phoneme —
Wav2Vec2PhonemeCTCTokenizer(Wav2Vec2Phoneme model) - whisper —
WhisperTokenizerorWhisperTokenizerFast(Whisper model) - xclip — CLIPTokenizer or CLIPTokenizerFast (X-CLIP model)
- xglm —
XGLMTokenizerorXGLMTokenizerFast(XGLM model) - xlm —
XLMTokenizer(XLM model) - xlm-prophetnet —
XLMProphetNetTokenizer(XLM-ProphetNet model) - xlm-roberta —
XLMRobertaTokenizerorXLMRobertaTokenizerFast(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaTokenizerorXLMRobertaTokenizerFast(XLM-RoBERTa-XL model) - xlnet —
XLNetTokenizerorXLNetTokenizerFast(XLNet model) - xlstm —
GPTNeoXTokenizerFast(xLSTM model) - xmod —
XLMRobertaTokenizerorXLMRobertaTokenizerFast(X-MOD model) - yoso — AlbertTokenizer or AlbertTokenizerFast (YOSO model)
- zamba —
LlamaTokenizerorLlamaTokenizerFast(Zamba model) - zamba2 —
LlamaTokenizerorLlamaTokenizerFast(Zamba2 model)
Examples:
>>> from transformers import AutoTokenizer
>>> # Download vocabulary from huggingface.co and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> # Download vocabulary from huggingface.co (user-uploaded) and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-german-cased")
>>> # If vocabulary files are in a directory (e.g. tokenizer was saved using *save_pretrained('./test/saved_model/')*)
>>> # tokenizer = AutoTokenizer.from_pretrained("./test/bert_saved_model/")
>>> # Download vocabulary from huggingface.co and define model-specific arguments
>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base", add_prefix_space=True)register
< source >( config_class slow_tokenizer_class = None fast_tokenizer_class = None exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- slow_tokenizer_class (
PretrainedTokenizer, optional) — The slow tokenizer to register. - fast_tokenizer_class (
PretrainedTokenizerFast, optional) — The fast tokenizer to register.
Register a new tokenizer in this mapping.
AutoFeatureExtractor
This is a generic feature extractor class that will be instantiated as one of the feature extractor classes of the library when created with the AutoFeatureExtractor.from_pretrained() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_pretrained
< source >( pretrained_model_name_or_path **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — This can be either:- a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
- a path to a directory containing a feature extractor file saved using the
save_pretrained() method, e.g.,
./my_model_directory/. - a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used. - force_download (
bool, optional, defaults toFalse) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.The proxies are used on each request. - token (
stror bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, will use the token generated when runninghf auth login(stored in~/.huggingface). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - return_unused_kwargs (
bool, optional, defaults toFalse) — IfFalse, then this function returns just the final feature extractor object. IfTrue, then this functions returns aTuple(feature_extractor, unused_kwargs)where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part ofkwargswhich has not been used to updatefeature_extractorand is otherwise ignored. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs (
dict[str, Any], optional) — The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by thereturn_unused_kwargskeyword parameter.
Instantiate one of the feature extractor classes of the library from a pretrained model vocabulary.
The feature extractor class to instantiate is selected based on the model_type property of the config object
(either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s
missing, by falling back to using pattern matching on pretrained_model_name_or_path:
- audio-spectrogram-transformer — ASTFeatureExtractor (Audio Spectrogram Transformer model)
- beit — BeitFeatureExtractor (BEiT model)
- chinese_clip — ChineseCLIPFeatureExtractor (Chinese-CLIP model)
- clap — ClapFeatureExtractor (CLAP model)
- clip — CLIPFeatureExtractor (CLIP model)
- clipseg —
ViTFeatureExtractor(CLIPSeg model) - clvp — ClvpFeatureExtractor (CLVP model)
- conditional_detr — ConditionalDetrFeatureExtractor (Conditional DETR model)
- convnext — ConvNextFeatureExtractor (ConvNeXT model)
- cvt — ConvNextFeatureExtractor (CvT model)
- dac —
DacFeatureExtractor(DAC model) - data2vec-audio —
Wav2Vec2FeatureExtractor(Data2VecAudio model) - data2vec-vision — BeitFeatureExtractor (Data2VecVision model)
- deformable_detr — DeformableDetrFeatureExtractor (Deformable DETR model)
- deit — DeiTFeatureExtractor (DeiT model)
- detr — DetrFeatureExtractor (DETR model)
- dia —
DiaFeatureExtractor(Dia model) - dinat —
ViTFeatureExtractor(DiNAT model) - donut-swin —
DonutFeatureExtractor(DonutSwin model) - dpt —
DPTFeatureExtractor(DPT model) - encodec —
EncodecFeatureExtractor(EnCodec model) - flava —
FlavaFeatureExtractor(FLAVA model) - gemma3n —
Gemma3nAudioFeatureExtractor(Gemma3nForConditionalGeneration model) - glpn —
GLPNFeatureExtractor(GLPN model) - granite_speech —
GraniteSpeechFeatureExtractor(GraniteSpeech model) - groupvit — CLIPFeatureExtractor (GroupViT model)
- hubert —
Wav2Vec2FeatureExtractor(Hubert model) - imagegpt —
ImageGPTFeatureExtractor(ImageGPT model) - kyutai_speech_to_text —
KyutaiSpeechToTextFeatureExtractor(KyutaiSpeechToText model) - layoutlmv2 —
LayoutLMv2FeatureExtractor(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3FeatureExtractor(LayoutLMv3 model) - levit —
LevitFeatureExtractor(LeViT model) - maskformer —
MaskFormerFeatureExtractor(MaskFormer model) - mctct —
MCTCTFeatureExtractor(M-CTC-T model) - mimi —
EncodecFeatureExtractor(Mimi model) - mobilenet_v1 —
MobileNetV1FeatureExtractor(MobileNetV1 model) - mobilenet_v2 —
MobileNetV2FeatureExtractor(MobileNetV2 model) - mobilevit —
MobileViTFeatureExtractor(MobileViT model) - moonshine —
Wav2Vec2FeatureExtractor(Moonshine model) - moshi —
EncodecFeatureExtractor(Moshi model) - nat —
ViTFeatureExtractor(NAT model) - owlvit —
OwlViTFeatureExtractor(OWL-ViT model) - parakeet_ctc —
ParakeetFeatureExtractor(Parakeet model) - parakeet_encoder —
ParakeetFeatureExtractor(ParakeetEncoder model) - perceiver —
PerceiverFeatureExtractor(Perceiver model) - phi4_multimodal —
Phi4MultimodalFeatureExtractor(Phi4Multimodal model) - poolformer —
PoolFormerFeatureExtractor(PoolFormer model) - pop2piano —
Pop2PianoFeatureExtractor(Pop2Piano model) - regnet — ConvNextFeatureExtractor (RegNet model)
- resnet — ConvNextFeatureExtractor (ResNet model)
- seamless_m4t —
SeamlessM4TFeatureExtractor(SeamlessM4T model) - seamless_m4t_v2 —
SeamlessM4TFeatureExtractor(SeamlessM4Tv2 model) - segformer —
SegformerFeatureExtractor(SegFormer model) - sew —
Wav2Vec2FeatureExtractor(SEW model) - sew-d —
Wav2Vec2FeatureExtractor(SEW-D model) - speech_to_text —
Speech2TextFeatureExtractor(Speech2Text model) - speecht5 —
SpeechT5FeatureExtractor(SpeechT5 model) - swiftformer —
ViTFeatureExtractor(SwiftFormer model) - swin —
ViTFeatureExtractor(Swin Transformer model) - swinv2 —
ViTFeatureExtractor(Swin Transformer V2 model) - table-transformer — DetrFeatureExtractor (Table Transformer model)
- timesformer —
VideoMAEFeatureExtractor(TimeSformer model) - tvlt —
TvltFeatureExtractor(TVLT model) - unispeech —
Wav2Vec2FeatureExtractor(UniSpeech model) - unispeech-sat —
Wav2Vec2FeatureExtractor(UniSpeechSat model) - univnet —
UnivNetFeatureExtractor(UnivNet model) - van — ConvNextFeatureExtractor (VAN model)
- videomae —
VideoMAEFeatureExtractor(VideoMAE model) - vilt —
ViltFeatureExtractor(ViLT model) - vit —
ViTFeatureExtractor(ViT model) - vit_mae —
ViTFeatureExtractor(ViTMAE model) - vit_msn —
ViTFeatureExtractor(ViTMSN model) - wav2vec2 —
Wav2Vec2FeatureExtractor(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2FeatureExtractor(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2FeatureExtractor(Wav2Vec2-Conformer model) - wavlm —
Wav2Vec2FeatureExtractor(WavLM model) - whisper —
WhisperFeatureExtractor(Whisper model) - xclip — CLIPFeatureExtractor (X-CLIP model)
- xcodec —
DacFeatureExtractor(X-CODEC model) - yolos —
YolosFeatureExtractor(YOLOS model)
Passing
token=Trueis required when you want to use a private model.
Examples:
>>> from transformers import AutoFeatureExtractor
>>> # Download feature extractor from huggingface.co and cache.
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")
>>> # If feature extractor files are in a directory (e.g. feature extractor was saved using *save_pretrained('./test/saved_model/')*)
>>> # feature_extractor = AutoFeatureExtractor.from_pretrained("./test/saved_model/")register
< source >( config_class feature_extractor_class exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- feature_extractor_class (
FeatureExtractorMixin) — The feature extractor to register.
Register a new feature extractor for this class.
AutoImageProcessor
This is a generic image processor class that will be instantiated as one of the image processor classes of the library when created with the AutoImageProcessor.from_pretrained() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_pretrained
< source >( pretrained_model_name_or_path *inputs **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — This can be either:- a string, the model id of a pretrained image_processor hosted inside a model repo on huggingface.co.
- a path to a directory containing a image processor file saved using the
save_pretrained() method, e.g.,
./my_model_directory/. - a path or url to a saved image processor JSON file, e.g.,
./my_model_directory/preprocessor_config.json.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used. - force_download (
bool, optional, defaults toFalse) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.The proxies are used on each request. - token (
stror bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, will use the token generated when runninghf auth login(stored in~/.huggingface). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - use_fast (
bool, optional, defaults toFalse) — Use a fast torchvision-base image processor if it is supported for a given model. If a fast image processor is not available for a given model, a normal numpy-based image processor is returned instead. - return_unused_kwargs (
bool, optional, defaults toFalse) — IfFalse, then this function returns just the final image processor object. IfTrue, then this functions returns aTuple(image_processor, unused_kwargs)where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not image processor attributes: i.e., the part ofkwargswhich has not been used to updateimage_processorand is otherwise ignored. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - image_processor_filename (
str, optional, defaults to"config.json") — The name of the file in the model directory to use for the image processor config. - kwargs (
dict[str, Any], optional) — The values in kwargs of any keys which are image processor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not image processor attributes is controlled by thereturn_unused_kwargskeyword parameter.
Instantiate one of the image processor classes of the library from a pretrained model vocabulary.
The image processor class to instantiate is selected based on the model_type property of the config object
(either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s
missing, by falling back to using pattern matching on pretrained_model_name_or_path:
- aimv2 — CLIPImageProcessor or CLIPImageProcessorFast (AIMv2 model)
- aimv2_vision_model — CLIPImageProcessor or CLIPImageProcessorFast (Aimv2VisionModel model)
- align —
EfficientNetImageProcessororEfficientNetImageProcessorFast(ALIGN model) - aria —
AriaImageProcessor(Aria model) - beit — BeitImageProcessor or BeitImageProcessorFast (BEiT model)
- bit — BitImageProcessor or BitImageProcessorFast (BiT model)
- blip — BlipImageProcessor or BlipImageProcessorFast (BLIP model)
- blip-2 — BlipImageProcessor or BlipImageProcessorFast (BLIP-2 model)
- bridgetower — BridgeTowerImageProcessor or BridgeTowerImageProcessorFast (BridgeTower model)
- chameleon —
ChameleonImageProcessororChameleonImageProcessorFast(Chameleon model) - chinese_clip — ChineseCLIPImageProcessor or ChineseCLIPImageProcessorFast (Chinese-CLIP model)
- clip — CLIPImageProcessor or CLIPImageProcessorFast (CLIP model)
- clipseg —
ViTImageProcessororViTImageProcessorFast(CLIPSeg model) - cohere2_vision —
Cohere2VisionImageProcessorFast(Cohere2Vision model) - conditional_detr — ConditionalDetrImageProcessor or ConditionalDetrImageProcessorFast (Conditional DETR model)
- convnext — ConvNextImageProcessor or ConvNextImageProcessorFast (ConvNeXT model)
- convnextv2 — ConvNextImageProcessor or ConvNextImageProcessorFast (ConvNeXTV2 model)
- cvt — ConvNextImageProcessor or ConvNextImageProcessorFast (CvT model)
- data2vec-vision — BeitImageProcessor or BeitImageProcessorFast (Data2VecVision model)
- deepseek_vl —
DeepseekVLImageProcessororDeepseekVLImageProcessorFast(DeepseekVL model) - deepseek_vl_hybrid —
DeepseekVLHybridImageProcessororDeepseekVLHybridImageProcessorFast(DeepseekVLHybrid model) - deformable_detr — DeformableDetrImageProcessor or
DeformableDetrImageProcessorFast(Deformable DETR model) - deit — DeiTImageProcessor or DeiTImageProcessorFast (DeiT model)
- depth_anything —
DPTImageProcessororDPTImageProcessorFast(Depth Anything model) - depth_pro —
DepthProImageProcessororDepthProImageProcessorFast(DepthPro model) - deta — DetaImageProcessor (DETA model)
- detr — DetrImageProcessor or DetrImageProcessorFast (DETR model)
- dinat —
ViTImageProcessororViTImageProcessorFast(DiNAT model) - dinov2 — BitImageProcessor or BitImageProcessorFast (DINOv2 model)
- dinov3_vit —
DINOv3ViTImageProcessorFast(DINOv3 ViT model) - donut-swin —
DonutImageProcessororDonutImageProcessorFast(DonutSwin model) - dpt —
DPTImageProcessororDPTImageProcessorFast(DPT model) - edgetam —
Sam2ImageProcessorFast(EdgeTAM model) - efficientformer —
EfficientFormerImageProcessor(EfficientFormer model) - efficientloftr —
EfficientLoFTRImageProcessororEfficientLoFTRImageProcessorFast(EfficientLoFTR model) - efficientnet —
EfficientNetImageProcessororEfficientNetImageProcessorFast(EfficientNet model) - eomt —
EomtImageProcessororEomtImageProcessorFast(EoMT model) - flava —
FlavaImageProcessororFlavaImageProcessorFast(FLAVA model) - focalnet — BitImageProcessor or BitImageProcessorFast (FocalNet model)
- fuyu —
FuyuImageProcessor(Fuyu model) - gemma3 —
Gemma3ImageProcessororGemma3ImageProcessorFast(Gemma3ForConditionalGeneration model) - gemma3n —
SiglipImageProcessororSiglipImageProcessorFast(Gemma3nForConditionalGeneration model) - git — CLIPImageProcessor or CLIPImageProcessorFast (GIT model)
- glm4v —
Glm4vImageProcessororGlm4vImageProcessorFast(GLM4V model) - glpn —
GLPNImageProcessor(GLPN model) - got_ocr2 —
GotOcr2ImageProcessororGotOcr2ImageProcessorFast(GOT-OCR2 model) - grounding-dino —
GroundingDinoImageProcessororGroundingDinoImageProcessorFast(Grounding DINO model) - groupvit — CLIPImageProcessor or CLIPImageProcessorFast (GroupViT model)
- hiera — BitImageProcessor or BitImageProcessorFast (Hiera model)
- idefics —
IdeficsImageProcessor(IDEFICS model) - idefics2 —
Idefics2ImageProcessororIdefics2ImageProcessorFast(Idefics2 model) - idefics3 —
Idefics3ImageProcessororIdefics3ImageProcessorFast(Idefics3 model) - ijepa —
ViTImageProcessororViTImageProcessorFast(I-JEPA model) - imagegpt —
ImageGPTImageProcessororImageGPTImageProcessorFast(ImageGPT model) - instructblip — BlipImageProcessor or BlipImageProcessorFast (InstructBLIP model)
- instructblipvideo —
InstructBlipVideoImageProcessor(InstructBlipVideo model) - janus —
JanusImageProcessororJanusImageProcessorFast(Janus model) - kosmos-2 — CLIPImageProcessor or CLIPImageProcessorFast (KOSMOS-2 model)
- kosmos-2.5 —
Kosmos2_5ImageProcessororKosmos2_5ImageProcessorFast(KOSMOS-2.5 model) - layoutlmv2 —
LayoutLMv2ImageProcessororLayoutLMv2ImageProcessorFast(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3ImageProcessororLayoutLMv3ImageProcessorFast(LayoutLMv3 model) - levit —
LevitImageProcessororLevitImageProcessorFast(LeViT model) - lfm2_vl —
Lfm2VlImageProcessorFast(Lfm2Vl model) - lightglue —
LightGlueImageProcessor(LightGlue model) - llama4 —
Llama4ImageProcessororLlama4ImageProcessorFast(Llama4 model) - llava —
LlavaImageProcessororLlavaImageProcessorFast(LLaVa model) - llava_next —
LlavaNextImageProcessororLlavaNextImageProcessorFast(LLaVA-NeXT model) - llava_next_video —
LlavaNextVideoImageProcessor(LLaVa-NeXT-Video model) - llava_onevision —
LlavaOnevisionImageProcessororLlavaOnevisionImageProcessorFast(LLaVA-Onevision model) - mask2former —
Mask2FormerImageProcessororMask2FormerImageProcessorFast(Mask2Former model) - maskformer —
MaskFormerImageProcessororMaskFormerImageProcessorFast(MaskFormer model) - metaclip_2 — CLIPImageProcessor or CLIPImageProcessorFast (MetaCLIP 2 model)
- mgp-str —
ViTImageProcessororViTImageProcessorFast(MGP-STR model) - mistral3 —
PixtralImageProcessororPixtralImageProcessorFast(Mistral3 model) - mlcd — CLIPImageProcessor or CLIPImageProcessorFast (MLCD model)
- mllama —
MllamaImageProcessor(Mllama model) - mm-grounding-dino —
GroundingDinoImageProcessororGroundingDinoImageProcessorFast(MM Grounding DINO model) - mobilenet_v1 —
MobileNetV1ImageProcessororMobileNetV1ImageProcessorFast(MobileNetV1 model) - mobilenet_v2 —
MobileNetV2ImageProcessororMobileNetV2ImageProcessorFast(MobileNetV2 model) - mobilevit —
MobileViTImageProcessororMobileViTImageProcessorFast(MobileViT model) - mobilevitv2 —
MobileViTImageProcessororMobileViTImageProcessorFast(MobileViTV2 model) - nat —
ViTImageProcessororViTImageProcessorFast(NAT model) - nougat —
NougatImageProcessororNougatImageProcessorFast(Nougat model) - oneformer —
OneFormerImageProcessororOneFormerImageProcessorFast(OneFormer model) - ovis2 —
Ovis2ImageProcessororOvis2ImageProcessorFast(Ovis2 model) - owlv2 —
Owlv2ImageProcessororOwlv2ImageProcessorFast(OWLv2 model) - owlvit —
OwlViTImageProcessororOwlViTImageProcessorFast(OWL-ViT model) - paligemma —
SiglipImageProcessororSiglipImageProcessorFast(PaliGemma model) - perceiver —
PerceiverImageProcessororPerceiverImageProcessorFast(Perceiver model) - perception_lm —
PerceptionLMImageProcessorFast(PerceptionLM model) - phi4_multimodal —
Phi4MultimodalImageProcessorFast(Phi4Multimodal model) - pix2struct —
Pix2StructImageProcessor(Pix2Struct model) - pixtral —
PixtralImageProcessororPixtralImageProcessorFast(Pixtral model) - poolformer —
PoolFormerImageProcessororPoolFormerImageProcessorFast(PoolFormer model) - prompt_depth_anything —
PromptDepthAnythingImageProcessororPromptDepthAnythingImageProcessorFast(PromptDepthAnything model) - pvt —
PvtImageProcessororPvtImageProcessorFast(PVT model) - pvt_v2 —
PvtImageProcessororPvtImageProcessorFast(PVTv2 model) - qwen2_5_vl —
Qwen2VLImageProcessororQwen2VLImageProcessorFast(Qwen2_5_VL model) - qwen2_vl —
Qwen2VLImageProcessororQwen2VLImageProcessorFast(Qwen2VL model) - qwen3_vl —
Qwen2VLImageProcessororQwen2VLImageProcessorFast(Qwen3VL model) - regnet — ConvNextImageProcessor or ConvNextImageProcessorFast (RegNet model)
- resnet — ConvNextImageProcessor or ConvNextImageProcessorFast (ResNet model)
- rt_detr —
RTDetrImageProcessororRTDetrImageProcessorFast(RT-DETR model) - sam —
SamImageProcessororSamImageProcessorFast(SAM model) - sam2 —
Sam2ImageProcessorFast(SAM2 model) - sam_hq —
SamImageProcessororSamImageProcessorFast(SAM-HQ model) - segformer —
SegformerImageProcessororSegformerImageProcessorFast(SegFormer model) - seggpt —
SegGptImageProcessor(SegGPT model) - shieldgemma2 —
Gemma3ImageProcessororGemma3ImageProcessorFast(Shieldgemma2 model) - siglip —
SiglipImageProcessororSiglipImageProcessorFast(SigLIP model) - siglip2 —
Siglip2ImageProcessororSiglip2ImageProcessorFast(SigLIP2 model) - smolvlm —
SmolVLMImageProcessororSmolVLMImageProcessorFast(SmolVLM model) - superglue —
SuperGlueImageProcessor(SuperGlue model) - superpoint —
SuperPointImageProcessororSuperPointImageProcessorFast(SuperPoint model) - swiftformer —
ViTImageProcessororViTImageProcessorFast(SwiftFormer model) - swin —
ViTImageProcessororViTImageProcessorFast(Swin Transformer model) - swin2sr —
Swin2SRImageProcessororSwin2SRImageProcessorFast(Swin2SR model) - swinv2 —
ViTImageProcessororViTImageProcessorFast(Swin Transformer V2 model) - table-transformer — DetrImageProcessor or DetrImageProcessorFast (Table Transformer model)
- textnet —
TextNetImageProcessororTextNetImageProcessorFast(TextNet model) - timesformer —
VideoMAEImageProcessor(TimeSformer model) - timm_wrapper —
TimmWrapperImageProcessor(TimmWrapperModel model) - tvlt —
TvltImageProcessor(TVLT model) - tvp —
TvpImageProcessororTvpImageProcessorFast(TVP model) - udop —
LayoutLMv3ImageProcessororLayoutLMv3ImageProcessorFast(UDOP model) - upernet —
SegformerImageProcessororSegformerImageProcessorFast(UPerNet model) - van — ConvNextImageProcessor or ConvNextImageProcessorFast (VAN model)
- videomae —
VideoMAEImageProcessor(VideoMAE model) - vilt —
ViltImageProcessororViltImageProcessorFast(ViLT model) - vipllava — CLIPImageProcessor or CLIPImageProcessorFast (VipLlava model)
- vit —
ViTImageProcessororViTImageProcessorFast(ViT model) - vit_hybrid —
ViTHybridImageProcessor(ViT Hybrid model) - vit_mae —
ViTImageProcessororViTImageProcessorFast(ViTMAE model) - vit_msn —
ViTImageProcessororViTImageProcessorFast(ViTMSN model) - vitmatte —
VitMatteImageProcessororVitMatteImageProcessorFast(ViTMatte model) - xclip — CLIPImageProcessor or CLIPImageProcessorFast (X-CLIP model)
- yolos —
YolosImageProcessororYolosImageProcessorFast(YOLOS model) - zoedepth —
ZoeDepthImageProcessororZoeDepthImageProcessorFast(ZoeDepth model)
Passing
token=Trueis required when you want to use a private model.
Examples:
>>> from transformers import AutoImageProcessor
>>> # Download image processor from huggingface.co and cache.
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
>>> # If image processor files are in a directory (e.g. image processor was saved using *save_pretrained('./test/saved_model/')*)
>>> # image_processor = AutoImageProcessor.from_pretrained("./test/saved_model/")register
< source >( config_class image_processor_class = None slow_image_processor_class = None fast_image_processor_class = None exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- image_processor_class (ImageProcessingMixin) — The image processor to register.
Register a new image processor for this class.
AutoProcessor
This is a generic processor class that will be instantiated as one of the processor classes of the library when created with the AutoProcessor.from_pretrained() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_pretrained
< source >( pretrained_model_name_or_path **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — This can be either:- a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
- a path to a directory containing a processor files saved using the
save_pretrained()method, e.g.,./my_model_directory/.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used. - force_download (
bool, optional, defaults toFalse) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.The proxies are used on each request. - token (
stror bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, will use the token generated when runninghf auth login(stored in~/.huggingface). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - return_unused_kwargs (
bool, optional, defaults toFalse) — IfFalse, then this function returns just the final feature extractor object. IfTrue, then this functions returns aTuple(feature_extractor, unused_kwargs)where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part ofkwargswhich has not been used to updatefeature_extractorand is otherwise ignored. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs (
dict[str, Any], optional) — The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by thereturn_unused_kwargskeyword parameter.
Instantiate one of the processor classes of the library from a pretrained model vocabulary.
The processor class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible):
- aimv2 — CLIPProcessor (AIMv2 model)
- align — AlignProcessor (ALIGN model)
- altclip — AltCLIPProcessor (AltCLIP model)
- aria —
AriaProcessor(Aria model) - aya_vision —
AyaVisionProcessor(AyaVision model) - bark — BarkProcessor (Bark model)
- blip — BlipProcessor (BLIP model)
- blip-2 — Blip2Processor (BLIP-2 model)
- bridgetower — BridgeTowerProcessor (BridgeTower model)
- chameleon —
ChameleonProcessor(Chameleon model) - chinese_clip — ChineseCLIPProcessor (Chinese-CLIP model)
- clap — ClapProcessor (CLAP model)
- clip — CLIPProcessor (CLIP model)
- clipseg — CLIPSegProcessor (CLIPSeg model)
- clvp — ClvpProcessor (CLVP model)
- cohere2_vision —
Cohere2VisionProcessor(Cohere2Vision model) - colpali —
ColPaliProcessor(ColPali model) - colqwen2 —
ColQwen2Processor(ColQwen2 model) - deepseek_vl —
DeepseekVLProcessor(DeepseekVL model) - deepseek_vl_hybrid —
DeepseekVLHybridProcessor(DeepseekVLHybrid model) - dia —
DiaProcessor(Dia model) - edgetam —
Sam2Processor(EdgeTAM model) - emu3 —
Emu3Processor(Emu3 model) - evolla —
EvollaProcessor(Evolla model) - flava —
FlavaProcessor(FLAVA model) - florence2 —
Florence2Processor(Florence2 model) - fuyu —
FuyuProcessor(Fuyu model) - gemma3 —
Gemma3Processor(Gemma3ForConditionalGeneration model) - gemma3n —
Gemma3nProcessor(Gemma3nForConditionalGeneration model) - git —
GitProcessor(GIT model) - glm4v —
Glm4vProcessor(GLM4V model) - glm4v_moe —
Glm4vProcessor(GLM4VMOE model) - got_ocr2 —
GotOcr2Processor(GOT-OCR2 model) - granite_speech —
GraniteSpeechProcessor(GraniteSpeech model) - grounding-dino —
GroundingDinoProcessor(Grounding DINO model) - groupvit — CLIPProcessor (GroupViT model)
- hubert —
Wav2Vec2Processor(Hubert model) - idefics —
IdeficsProcessor(IDEFICS model) - idefics2 —
Idefics2Processor(Idefics2 model) - idefics3 —
Idefics3Processor(Idefics3 model) - instructblip —
InstructBlipProcessor(InstructBLIP model) - instructblipvideo —
InstructBlipVideoProcessor(InstructBlipVideo model) - internvl —
InternVLProcessor(InternVL model) - janus —
JanusProcessor(Janus model) - kosmos-2 —
Kosmos2Processor(KOSMOS-2 model) - kosmos-2.5 —
Kosmos2_5Processor(KOSMOS-2.5 model) - kyutai_speech_to_text —
KyutaiSpeechToTextProcessor(KyutaiSpeechToText model) - layoutlmv2 —
LayoutLMv2Processor(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3Processor(LayoutLMv3 model) - lfm2_vl —
Lfm2VlProcessor(Lfm2Vl model) - llama4 —
Llama4Processor(Llama4 model) - llava —
LlavaProcessor(LLaVa model) - llava_next —
LlavaNextProcessor(LLaVA-NeXT model) - llava_next_video —
LlavaNextVideoProcessor(LLaVa-NeXT-Video model) - llava_onevision —
LlavaOnevisionProcessor(LLaVA-Onevision model) - markuplm —
MarkupLMProcessor(MarkupLM model) - mctct —
MCTCTProcessor(M-CTC-T model) - metaclip_2 — CLIPProcessor (MetaCLIP 2 model)
- mgp-str —
MgpstrProcessor(MGP-STR model) - mistral3 —
PixtralProcessor(Mistral3 model) - mllama —
MllamaProcessor(Mllama model) - mm-grounding-dino —
GroundingDinoProcessor(MM Grounding DINO model) - moonshine —
Wav2Vec2Processor(Moonshine model) - oneformer —
OneFormerProcessor(OneFormer model) - ovis2 —
Ovis2Processor(Ovis2 model) - owlv2 —
Owlv2Processor(OWLv2 model) - owlvit —
OwlViTProcessor(OWL-ViT model) - paligemma —
PaliGemmaProcessor(PaliGemma model) - perception_lm —
PerceptionLMProcessor(PerceptionLM model) - phi4_multimodal —
Phi4MultimodalProcessor(Phi4Multimodal model) - pix2struct —
Pix2StructProcessor(Pix2Struct model) - pixtral —
PixtralProcessor(Pixtral model) - pop2piano —
Pop2PianoProcessor(Pop2Piano model) - qwen2_5_omni —
Qwen2_5OmniProcessor(Qwen2_5Omni model) - qwen2_5_vl —
Qwen2_5_VLProcessor(Qwen2_5_VL model) - qwen2_audio —
Qwen2AudioProcessor(Qwen2Audio model) - qwen2_vl —
Qwen2VLProcessor(Qwen2VL model) - qwen3_omni_moe —
Qwen3OmniMoeProcessor(Qwen3OmniMoE model) - qwen3_vl —
Qwen3VLProcessor(Qwen3VL model) - qwen3_vl_moe —
Qwen3VLProcessor(Qwen3VLMoe model) - sam —
SamProcessor(SAM model) - sam2 —
Sam2Processor(SAM2 model) - sam_hq —
SamHQProcessor(SAM-HQ model) - seamless_m4t —
SeamlessM4TProcessor(SeamlessM4T model) - sew —
Wav2Vec2Processor(SEW model) - sew-d —
Wav2Vec2Processor(SEW-D model) - shieldgemma2 —
ShieldGemma2Processor(Shieldgemma2 model) - siglip —
SiglipProcessor(SigLIP model) - siglip2 —
Siglip2Processor(SigLIP2 model) - smolvlm —
SmolVLMProcessor(SmolVLM model) - speech_to_text —
Speech2TextProcessor(Speech2Text model) - speech_to_text_2 —
Speech2Text2Processor(Speech2Text2 model) - speecht5 —
SpeechT5Processor(SpeechT5 model) - trocr —
TrOCRProcessor(TrOCR model) - tvlt —
TvltProcessor(TVLT model) - tvp —
TvpProcessor(TVP model) - udop —
UdopProcessor(UDOP model) - unispeech —
Wav2Vec2Processor(UniSpeech model) - unispeech-sat —
Wav2Vec2Processor(UniSpeechSat model) - video_llava —
VideoLlavaProcessor(VideoLlava model) - vilt —
ViltProcessor(ViLT model) - vipllava —
LlavaProcessor(VipLlava model) - vision-text-dual-encoder —
VisionTextDualEncoderProcessor(VisionTextDualEncoder model) - voxtral —
VoxtralProcessor(Voxtral model) - wav2vec2 —
Wav2Vec2Processor(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2Processor(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2Processor(Wav2Vec2-Conformer model) - wavlm —
Wav2Vec2Processor(WavLM model) - whisper —
WhisperProcessor(Whisper model) - xclip —
XCLIPProcessor(X-CLIP model)
Passing
token=Trueis required when you want to use a private model.
Examples:
>>> from transformers import AutoProcessor
>>> # Download processor from huggingface.co and cache.
>>> processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base-960h")
>>> # If processor files are in a directory (e.g. processor was saved using *save_pretrained('./test/saved_model/')*)
>>> # processor = AutoProcessor.from_pretrained("./test/saved_model/")register
< source >( config_class processor_class exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- processor_class (ProcessorMixin) — The processor to register.
Register a new processor for this class.
Generic model classes
以下の自動クラスは、特定のヘッドを持たないベースモデルクラスをインスタンス化するために利用可能です。
AutoModel
This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- ASTConfig configuration class: ASTModel (Audio Spectrogram Transformer model)
Aimv2Configconfiguration class:Aimv2Model(AIMv2 model)Aimv2VisionConfigconfiguration class:Aimv2VisionModel(Aimv2VisionModel model)- AlbertConfig configuration class: AlbertModel (ALBERT model)
- AlignConfig configuration class: AlignModel (ALIGN model)
- AltCLIPConfig configuration class: AltCLIPModel (AltCLIP model)
ApertusConfigconfiguration class:ApertusModel(Apertus model)ArceeConfigconfiguration class:ArceeModel(Arcee model)AriaConfigconfiguration class:AriaModel(Aria model)AriaTextConfigconfiguration class:AriaTextModel(AriaText model)- AutoformerConfig configuration class: AutoformerModel (Autoformer model)
AyaVisionConfigconfiguration class:AyaVisionModel(AyaVision model)BambaConfigconfiguration class:BambaModel(Bamba model)- BarkConfig configuration class: BarkModel (Bark model)
- BartConfig configuration class: BartModel (BART model)
- BeitConfig configuration class: BeitModel (BEiT model)
- BertConfig configuration class: BertModel (BERT model)
- BertGenerationConfig configuration class: BertGenerationEncoder (Bert Generation model)
- BigBirdConfig configuration class: BigBirdModel (BigBird model)
- BigBirdPegasusConfig configuration class: BigBirdPegasusModel (BigBird-Pegasus model)
- BioGptConfig configuration class: BioGptModel (BioGpt model)
- BitConfig configuration class: BitModel (BiT model)
BitNetConfigconfiguration class:BitNetModel(BitNet model)- BlenderbotConfig configuration class: BlenderbotModel (Blenderbot model)
- BlenderbotSmallConfig configuration class: BlenderbotSmallModel (BlenderbotSmall model)
- Blip2Config configuration class: Blip2Model (BLIP-2 model)
- Blip2QFormerConfig configuration class: Blip2QFormerModel (BLIP-2 QFormer model)
- BlipConfig configuration class: BlipModel (BLIP model)
- BloomConfig configuration class: BloomModel (BLOOM model)
BltConfigconfiguration class:BltModel(Blt model)- BridgeTowerConfig configuration class: BridgeTowerModel (BridgeTower model)
- BrosConfig configuration class: BrosModel (BROS model)
- CLIPConfig configuration class: CLIPModel (CLIP model)
- CLIPSegConfig configuration class: CLIPSegModel (CLIPSeg model)
- CLIPTextConfig configuration class: CLIPTextModel (CLIPTextModel model)
- CLIPVisionConfig configuration class: CLIPVisionModel (CLIPVisionModel model)
- CTRLConfig configuration class: CTRLModel (CTRL model)
- CamembertConfig configuration class: CamembertModel (CamemBERT model)
- CanineConfig configuration class: CanineModel (CANINE model)
ChameleonConfigconfiguration class:ChameleonModel(Chameleon model)- ChineseCLIPConfig configuration class: ChineseCLIPModel (Chinese-CLIP model)
- ChineseCLIPVisionConfig configuration class: ChineseCLIPVisionModel (ChineseCLIPVisionModel model)
- ClapConfig configuration class: ClapModel (CLAP model)
- ClvpConfig configuration class: ClvpModelForConditionalGeneration (CLVP model)
- CodeGenConfig configuration class: CodeGenModel (CodeGen model)
Cohere2Configconfiguration class:Cohere2Model(Cohere2 model)Cohere2VisionConfigconfiguration class:Cohere2VisionModel(Cohere2Vision model)CohereConfigconfiguration class:CohereModel(Cohere model)- ConditionalDetrConfig configuration class: ConditionalDetrModel (Conditional DETR model)
- ConvBertConfig configuration class: ConvBertModel (ConvBERT model)
- ConvNextConfig configuration class: ConvNextModel (ConvNeXT model)
- ConvNextV2Config configuration class: ConvNextV2Model (ConvNeXTV2 model)
- CpmAntConfig configuration class: CpmAntModel (CPM-Ant model)
CsmConfigconfiguration class:CsmForConditionalGeneration(CSM model)- CvtConfig configuration class: CvtModel (CvT model)
DFineConfigconfiguration class:DFineModel(D-FINE model)DINOv3ConvNextConfigconfiguration class:DINOv3ConvNextModel(DINOv3 ConvNext model)DINOv3ViTConfigconfiguration class:DINOv3ViTModel(DINOv3 ViT model)DPRConfigconfiguration class:DPRQuestionEncoder(DPR model)DPTConfigconfiguration class:DPTModel(DPT model)DabDetrConfigconfiguration class:DabDetrModel(DAB-DETR model)DacConfigconfiguration class:DacModel(DAC model)- Data2VecAudioConfig configuration class: Data2VecAudioModel (Data2VecAudio model)
- Data2VecTextConfig configuration class: Data2VecTextModel (Data2VecText model)
- Data2VecVisionConfig configuration class: Data2VecVisionModel (Data2VecVision model)
DbrxConfigconfiguration class:DbrxModel(DBRX model)- DebertaConfig configuration class: DebertaModel (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2Model (DeBERTa-v2 model)
- DecisionTransformerConfig configuration class:
DecisionTransformerModel(Decision Transformer model) DeepseekV2Configconfiguration class:DeepseekV2Model(DeepSeek-V2 model)DeepseekV3Configconfiguration class:DeepseekV3Model(DeepSeek-V3 model)DeepseekVLConfigconfiguration class:DeepseekVLModel(DeepseekVL model)DeepseekVLHybridConfigconfiguration class:DeepseekVLHybridModel(DeepseekVLHybrid model)- DeformableDetrConfig configuration class: DeformableDetrModel (Deformable DETR model)
- DeiTConfig configuration class: DeiTModel (DeiT model)
DepthProConfigconfiguration class:DepthProModel(DepthPro model)- DetaConfig configuration class: DetaModel (DETA model)
- DetrConfig configuration class: DetrModel (DETR model)
DiaConfigconfiguration class:DiaModel(Dia model)DiffLlamaConfigconfiguration class:DiffLlamaModel(DiffLlama model)- DinatConfig configuration class: DinatModel (DiNAT model)
Dinov2Configconfiguration class:Dinov2Model(DINOv2 model)Dinov2WithRegistersConfigconfiguration class:Dinov2WithRegistersModel(DINOv2 with Registers model)DistilBertConfigconfiguration class:DistilBertModel(DistilBERT model)DogeConfigconfiguration class:DogeModel(Doge model)DonutSwinConfigconfiguration class:DonutSwinModel(DonutSwin model)Dots1Configconfiguration class:Dots1Model(dots1 model)EdgeTamConfigconfiguration class:EdgeTamModel(EdgeTAM model)EdgeTamVideoConfigconfiguration class:EdgeTamVideoModel(EdgeTamVideo model)EdgeTamVisionConfigconfiguration class:EdgeTamVisionModel(EdgeTamVisionModel model)EfficientFormerConfigconfiguration class:EfficientFormerModel(EfficientFormer model)EfficientLoFTRConfigconfiguration class:EfficientLoFTRModel(EfficientLoFTR model)EfficientNetConfigconfiguration class:EfficientNetModel(EfficientNet model)ElectraConfigconfiguration class:ElectraModel(ELECTRA model)Emu3Configconfiguration class:Emu3Model(Emu3 model)EncodecConfigconfiguration class:EncodecModel(EnCodec model)Ernie4_5Configconfiguration class:Ernie4_5Model(Ernie4_5 model)Ernie4_5_MoeConfigconfiguration class:Ernie4_5_MoeModel(Ernie4_5_MoE model)ErnieConfigconfiguration class:ErnieModel(ERNIE model)ErnieMConfigconfiguration class:ErnieMModel(ErnieM model)EsmConfigconfiguration class:EsmModel(ESM model)EvollaConfigconfiguration class:EvollaModel(Evolla model)Exaone4Configconfiguration class:Exaone4Model(EXAONE-4.0 model)FNetConfigconfiguration class:FNetModel(FNet model)FSMTConfigconfiguration class:FSMTModel(FairSeq Machine-Translation model)FalconConfigconfiguration class:FalconModel(Falcon model)FalconH1Configconfiguration class:FalconH1Model(FalconH1 model)FalconMambaConfigconfiguration class:FalconMambaModel(FalconMamba model)FastSpeech2ConformerConfigconfiguration class:FastSpeech2ConformerModel(FastSpeech2Conformer model)FastSpeech2ConformerWithHifiGanConfigconfiguration class:FastSpeech2ConformerWithHifiGan(FastSpeech2ConformerWithHifiGan model)FlaubertConfigconfiguration class:FlaubertModel(FlauBERT model)FlavaConfigconfiguration class:FlavaModel(FLAVA model)FlexOlmoConfigconfiguration class:FlexOlmoModel(FlexOlmo model)Florence2Configconfiguration class:Florence2Model(Florence2 model)FocalNetConfigconfiguration class:FocalNetModel(FocalNet model)FunnelConfigconfiguration class:FunnelModelorFunnelBaseModel(Funnel Transformer model)FuyuConfigconfiguration class:FuyuModel(Fuyu model)GLPNConfigconfiguration class:GLPNModel(GLPN model)GPT2Configconfiguration class:GPT2Model(OpenAI GPT-2 model)GPTBigCodeConfigconfiguration class:GPTBigCodeModel(GPTBigCode model)GPTJConfigconfiguration class:GPTJModel(GPT-J model)GPTNeoConfigconfiguration class:GPTNeoModel(GPT Neo model)GPTNeoXConfigconfiguration class:GPTNeoXModel(GPT NeoX model)GPTNeoXJapaneseConfigconfiguration class:GPTNeoXJapaneseModel(GPT NeoX Japanese model)GPTSanJapaneseConfigconfiguration class:GPTSanJapaneseForConditionalGeneration(GPTSAN-japanese model)Gemma2Configconfiguration class:Gemma2Model(Gemma2 model)Gemma3Configconfiguration class:Gemma3Model(Gemma3ForConditionalGeneration model)Gemma3TextConfigconfiguration class:Gemma3TextModel(Gemma3ForCausalLM model)Gemma3nAudioConfigconfiguration class:Gemma3nAudioEncoder(Gemma3nAudioEncoder model)Gemma3nConfigconfiguration class:Gemma3nModel(Gemma3nForConditionalGeneration model)Gemma3nTextConfigconfiguration class:Gemma3nTextModel(Gemma3nForCausalLM model)Gemma3nVisionConfigconfiguration class:TimmWrapperModel(TimmWrapperModel model)GemmaConfigconfiguration class:GemmaModel(Gemma model)GitConfigconfiguration class:GitModel(GIT model)Glm4Configconfiguration class:Glm4Model(GLM4 model)Glm4MoeConfigconfiguration class:Glm4MoeModel(Glm4MoE model)Glm4vConfigconfiguration class:Glm4vModel(GLM4V model)Glm4vMoeConfigconfiguration class:Glm4vMoeModel(GLM4VMOE model)Glm4vMoeTextConfigconfiguration class:Glm4vMoeTextModel(GLM4VMOE model)Glm4vTextConfigconfiguration class:Glm4vTextModel(GLM4V model)GlmConfigconfiguration class:GlmModel(GLM model)GotOcr2Configconfiguration class:GotOcr2Model(GOT-OCR2 model)GptOssConfigconfiguration class:GptOssModel(GptOss model)GraniteConfigconfiguration class:GraniteModel(Granite model)GraniteMoeConfigconfiguration class:GraniteMoeModel(GraniteMoeMoe model)GraniteMoeHybridConfigconfiguration class:GraniteMoeHybridModel(GraniteMoeHybrid model)GraniteMoeSharedConfigconfiguration class:GraniteMoeSharedModel(GraniteMoeSharedMoe model)GraphormerConfigconfiguration class:GraphormerModel(Graphormer model)GroundingDinoConfigconfiguration class:GroundingDinoModel(Grounding DINO model)GroupViTConfigconfiguration class:GroupViTModel(GroupViT model)HGNetV2Configconfiguration class:HGNetV2Backbone(HGNet-V2 model)HeliumConfigconfiguration class:HeliumModel(Helium model)HieraConfigconfiguration class:HieraModel(Hiera model)HubertConfigconfiguration class:HubertModel(Hubert model)HunYuanDenseV1Configconfiguration class:HunYuanDenseV1Model(HunYuanDenseV1 model)HunYuanMoEV1Configconfiguration class:HunYuanMoEV1Model(HunYuanMoeV1 model)IBertConfigconfiguration class:IBertModel(I-BERT model)IJepaConfigconfiguration class:IJepaModel(I-JEPA model)Idefics2Configconfiguration class:Idefics2Model(Idefics2 model)Idefics3Configconfiguration class:Idefics3Model(Idefics3 model)Idefics3VisionConfigconfiguration class:Idefics3VisionTransformer(Idefics3VisionTransformer model)IdeficsConfigconfiguration class:IdeficsModel(IDEFICS model)ImageGPTConfigconfiguration class:ImageGPTModel(ImageGPT model)InformerConfigconfiguration class:InformerModel(Informer model)InstructBlipConfigconfiguration class:InstructBlipModel(InstructBLIP model)InstructBlipVideoConfigconfiguration class:InstructBlipVideoModel(InstructBlipVideo model)InternVLConfigconfiguration class:InternVLModel(InternVL model)InternVLVisionConfigconfiguration class:InternVLVisionModel(InternVLVision model)JambaConfigconfiguration class:JambaModel(Jamba model)JanusConfigconfiguration class:JanusModel(Janus model)JetMoeConfigconfiguration class:JetMoeModel(JetMoe model)JukeboxConfigconfiguration class:JukeboxModel(Jukebox model)Kosmos2Configconfiguration class:Kosmos2Model(KOSMOS-2 model)Kosmos2_5Configconfiguration class:Kosmos2_5Model(KOSMOS-2.5 model)KyutaiSpeechToTextConfigconfiguration class:KyutaiSpeechToTextModel(KyutaiSpeechToText model)LEDConfigconfiguration class:LEDModel(LED model)LayoutLMConfigconfiguration class:LayoutLMModel(LayoutLM model)LayoutLMv2Configconfiguration class:LayoutLMv2Model(LayoutLMv2 model)LayoutLMv3Configconfiguration class:LayoutLMv3Model(LayoutLMv3 model)LevitConfigconfiguration class:LevitModel(LeViT model)Lfm2Configconfiguration class:Lfm2Model(Lfm2 model)Lfm2VlConfigconfiguration class:Lfm2VlModel(Lfm2Vl model)LightGlueConfigconfiguration class:LightGlueForKeypointMatching(LightGlue model)LiltConfigconfiguration class:LiltModel(LiLT model)Llama4Configconfiguration class:Llama4ForConditionalGeneration(Llama4 model)Llama4TextConfigconfiguration class:Llama4TextModel(Llama4ForCausalLM model)LlamaConfigconfiguration class:LlamaModel(LLaMA model)LlavaConfigconfiguration class:LlavaModel(LLaVa model)LlavaNextConfigconfiguration class:LlavaNextModel(LLaVA-NeXT model)LlavaNextVideoConfigconfiguration class:LlavaNextVideoModel(LLaVa-NeXT-Video model)LlavaOnevisionConfigconfiguration class:LlavaOnevisionModel(LLaVA-Onevision model)LongT5Configconfiguration class:LongT5Model(LongT5 model)LongcatFlashConfigconfiguration class:LongcatFlashModel(LongCatFlash model)LongformerConfigconfiguration class:LongformerModel(Longformer model)LukeConfigconfiguration class:LukeModel(LUKE model)LxmertConfigconfiguration class:LxmertModel(LXMERT model)M2M100Configconfiguration class:M2M100Model(M2M100 model)MBartConfigconfiguration class:MBartModel(mBART model)MCTCTConfigconfiguration class:MCTCTModel(M-CTC-T model)MLCDVisionConfigconfiguration class:MLCDVisionModel(MLCD model)MMGroundingDinoConfigconfiguration class:MMGroundingDinoModel(MM Grounding DINO model)MPNetConfigconfiguration class:MPNetModel(MPNet model)MT5Configconfiguration class:MT5Model(MT5 model)Mamba2Configconfiguration class:Mamba2Model(mamba2 model)MambaConfigconfiguration class:MambaModel(Mamba model)MarianConfigconfiguration class:MarianModel(Marian model)MarkupLMConfigconfiguration class:MarkupLMModel(MarkupLM model)Mask2FormerConfigconfiguration class:Mask2FormerModel(Mask2Former model)MaskFormerConfigconfiguration class:MaskFormerModel(MaskFormer model)MaskFormerSwinConfigconfiguration class:MaskFormerSwinModel(MaskFormerSwin model)MegaConfigconfiguration class:MegaModel(MEGA model)MegatronBertConfigconfiguration class:MegatronBertModel(Megatron-BERT model)MetaClip2Configconfiguration class:MetaClip2Model(MetaCLIP 2 model)MgpstrConfigconfiguration class:MgpstrForSceneTextRecognition(MGP-STR model)MimiConfigconfiguration class:MimiModel(Mimi model)MiniMaxConfigconfiguration class:MiniMaxModel(MiniMax model)MinistralConfigconfiguration class:MinistralModel(Ministral model)Mistral3Configconfiguration class:Mistral3Model(Mistral3 model)MistralConfigconfiguration class:MistralModel(Mistral model)MixtralConfigconfiguration class:MixtralModel(Mixtral model)MllamaConfigconfiguration class:MllamaModel(Mllama model)MobileBertConfigconfiguration class:MobileBertModel(MobileBERT model)MobileNetV1Configconfiguration class:MobileNetV1Model(MobileNetV1 model)MobileNetV2Configconfiguration class:MobileNetV2Model(MobileNetV2 model)MobileViTConfigconfiguration class:MobileViTModel(MobileViT model)MobileViTV2Configconfiguration class:MobileViTV2Model(MobileViTV2 model)ModernBertConfigconfiguration class:ModernBertModel(ModernBERT model)ModernBertDecoderConfigconfiguration class:ModernBertDecoderModel(ModernBertDecoder model)MoonshineConfigconfiguration class:MoonshineModel(Moonshine model)MoshiConfigconfiguration class:MoshiModel(Moshi model)MptConfigconfiguration class:MptModel(MPT model)MraConfigconfiguration class:MraModel(MRA model)MusicgenConfigconfiguration class:MusicgenModel(MusicGen model)MusicgenMelodyConfigconfiguration class:MusicgenMelodyModel(MusicGen Melody model)MvpConfigconfiguration class:MvpModel(MVP model)NatConfigconfiguration class:NatModel(NAT model)NemotronConfigconfiguration class:NemotronModel(Nemotron model)NezhaConfigconfiguration class:NezhaModel(Nezha model)NllbMoeConfigconfiguration class:NllbMoeModel(NLLB-MOE model)NystromformerConfigconfiguration class:NystromformerModel(Nyströmformer model)OPTConfigconfiguration class:OPTModel(OPT model)Olmo2Configconfiguration class:Olmo2Model(OLMo2 model)Olmo3Configconfiguration class:Olmo3Model(Olmo3 model)OlmoConfigconfiguration class:OlmoModel(OLMo model)OlmoeConfigconfiguration class:OlmoeModel(OLMoE model)OmDetTurboConfigconfiguration class:OmDetTurboForObjectDetection(OmDet-Turbo model)OneFormerConfigconfiguration class:OneFormerModel(OneFormer model)OpenAIGPTConfigconfiguration class:OpenAIGPTModel(OpenAI GPT model)OpenLlamaConfigconfiguration class:OpenLlamaModel(OpenLlama model)Ovis2Configconfiguration class:Ovis2Model(Ovis2 model)OwlViTConfigconfiguration class:OwlViTModel(OWL-ViT model)Owlv2Configconfiguration class:Owlv2Model(OWLv2 model)PLBartConfigconfiguration class:PLBartModel(PLBart model)PaliGemmaConfigconfiguration class:PaliGemmaModel(PaliGemma model)ParakeetCTCConfigconfiguration class:ParakeetForCTC(Parakeet model)ParakeetEncoderConfigconfiguration class:ParakeetEncoder(ParakeetEncoder model)PatchTSMixerConfigconfiguration class:PatchTSMixerModel(PatchTSMixer model)PatchTSTConfigconfiguration class:PatchTSTModel(PatchTST model)PegasusConfigconfiguration class:PegasusModel(Pegasus model)PegasusXConfigconfiguration class:PegasusXModel(PEGASUS-X model)PerceiverConfigconfiguration class:PerceiverModel(Perceiver model)PerceptionLMConfigconfiguration class:PerceptionLMModel(PerceptionLM model)PersimmonConfigconfiguration class:PersimmonModel(Persimmon model)Phi3Configconfiguration class:Phi3Model(Phi3 model)Phi4MultimodalConfigconfiguration class:Phi4MultimodalModel(Phi4Multimodal model)PhiConfigconfiguration class:PhiModel(Phi model)PhimoeConfigconfiguration class:PhimoeModel(Phimoe model)PixtralVisionConfigconfiguration class:PixtralVisionModel(Pixtral model)PoolFormerConfigconfiguration class:PoolFormerModel(PoolFormer model)ProphetNetConfigconfiguration class:ProphetNetModel(ProphetNet model)PvtConfigconfiguration class:PvtModel(PVT model)PvtV2Configconfiguration class:PvtV2Model(PVTv2 model)QDQBertConfigconfiguration class:QDQBertModel(QDQBert model)Qwen2AudioEncoderConfigconfiguration class:Qwen2AudioEncoder(Qwen2AudioEncoder model)Qwen2Configconfiguration class:Qwen2Model(Qwen2 model)Qwen2MoeConfigconfiguration class:Qwen2MoeModel(Qwen2MoE model)Qwen2VLConfigconfiguration class:Qwen2VLModel(Qwen2VL model)Qwen2VLTextConfigconfiguration class:Qwen2VLTextModel(Qwen2VL model)Qwen2_5_VLConfigconfiguration class:Qwen2_5_VLModel(Qwen2_5_VL model)Qwen2_5_VLTextConfigconfiguration class:Qwen2_5_VLTextModel(Qwen2_5_VL model)Qwen3Configconfiguration class:Qwen3Model(Qwen3 model)Qwen3MoeConfigconfiguration class:Qwen3MoeModel(Qwen3MoE model)Qwen3NextConfigconfiguration class:Qwen3NextModel(Qwen3Next model)Qwen3VLConfigconfiguration class:Qwen3VLModel(Qwen3VL model)Qwen3VLMoeConfigconfiguration class:Qwen3VLMoeModel(Qwen3VLMoe model)Qwen3VLMoeTextConfigconfiguration class:Qwen3VLMoeTextModel(Qwen3VLMoe model)Qwen3VLTextConfigconfiguration class:Qwen3VLTextModel(Qwen3VL model)RTDetrConfigconfiguration class:RTDetrModel(RT-DETR model)RTDetrV2Configconfiguration class:RTDetrV2Model(RT-DETRv2 model)RecurrentGemmaConfigconfiguration class:RecurrentGemmaModel(RecurrentGemma model)ReformerConfigconfiguration class:ReformerModel(Reformer model)RegNetConfigconfiguration class:RegNetModel(RegNet model)RemBertConfigconfiguration class:RemBertModel(RemBERT model)ResNetConfigconfiguration class:ResNetModel(ResNet model)RetriBertConfigconfiguration class:RetriBertModel(RetriBERT model)RoCBertConfigconfiguration class:RoCBertModel(RoCBert model)RoFormerConfigconfiguration class:RoFormerModel(RoFormer model)RobertaConfigconfiguration class:RobertaModel(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormModel(RoBERTa-PreLayerNorm model)RwkvConfigconfiguration class:RwkvModel(RWKV model)SEWConfigconfiguration class:SEWModel(SEW model)SEWDConfigconfiguration class:SEWDModel(SEW-D model)Sam2Configconfiguration class:Sam2Model(SAM2 model)Sam2HieraDetConfigconfiguration class:Sam2HieraDetModel(Sam2HieraDetModel model)Sam2VideoConfigconfiguration class:Sam2VideoModel(Sam2VideoModel model)Sam2VisionConfigconfiguration class:Sam2VisionModel(Sam2VisionModel model)SamConfigconfiguration class:SamModel(SAM model)SamHQConfigconfiguration class:SamHQModel(SAM-HQ model)SamHQVisionConfigconfiguration class:SamHQVisionModel(SamHQVisionModel model)SamVisionConfigconfiguration class:SamVisionModel(SamVisionModel model)SeamlessM4TConfigconfiguration class:SeamlessM4TModel(SeamlessM4T model)SeamlessM4Tv2Configconfiguration class:SeamlessM4Tv2Model(SeamlessM4Tv2 model)SeedOssConfigconfiguration class:SeedOssModel(SeedOss model)SegGptConfigconfiguration class:SegGptModel(SegGPT model)SegformerConfigconfiguration class:SegformerModel(SegFormer model)Siglip2Configconfiguration class:Siglip2Model(SigLIP2 model)Siglip2VisionConfigconfiguration class:Siglip2VisionModel(Siglip2VisionModel model)SiglipConfigconfiguration class:SiglipModel(SigLIP model)SiglipVisionConfigconfiguration class:SiglipVisionModel(SiglipVisionModel model)SmolLM3Configconfiguration class:SmolLM3Model(SmolLM3 model)SmolVLMConfigconfiguration class:SmolVLMModel(SmolVLM model)SmolVLMVisionConfigconfiguration class:SmolVLMVisionTransformer(SmolVLMVisionTransformer model)Speech2TextConfigconfiguration class:Speech2TextModel(Speech2Text model)SpeechT5Configconfiguration class:SpeechT5Model(SpeechT5 model)SplinterConfigconfiguration class:SplinterModel(Splinter model)SqueezeBertConfigconfiguration class:SqueezeBertModel(SqueezeBERT model)StableLmConfigconfiguration class:StableLmModel(StableLm model)Starcoder2Configconfiguration class:Starcoder2Model(Starcoder2 model)SwiftFormerConfigconfiguration class:SwiftFormerModel(SwiftFormer model)Swin2SRConfigconfiguration class:Swin2SRModel(Swin2SR model)SwinConfigconfiguration class:SwinModel(Swin Transformer model)Swinv2Configconfiguration class:Swinv2Model(Swin Transformer V2 model)SwitchTransformersConfigconfiguration class:SwitchTransformersModel(SwitchTransformers model)T5Configconfiguration class:T5Model(T5 model)T5GemmaConfigconfiguration class:T5GemmaModel(T5Gemma model)TableTransformerConfigconfiguration class:TableTransformerModel(Table Transformer model)TapasConfigconfiguration class:TapasModel(TAPAS model)TextNetConfigconfiguration class:TextNetModel(TextNet model)TimeSeriesTransformerConfigconfiguration class:TimeSeriesTransformerModel(Time Series Transformer model)TimesFmConfigconfiguration class:TimesFmModel(TimesFm model)TimesformerConfigconfiguration class:TimesformerModel(TimeSformer model)TimmBackboneConfigconfiguration class:TimmBackbone(TimmBackbone model)TimmWrapperConfigconfiguration class:TimmWrapperModel(TimmWrapperModel model)TrajectoryTransformerConfigconfiguration class:TrajectoryTransformerModel(Trajectory Transformer model)TransfoXLConfigconfiguration class:TransfoXLModel(Transformer-XL model)TvltConfigconfiguration class:TvltModel(TVLT model)TvpConfigconfiguration class:TvpModel(TVP model)UMT5Configconfiguration class:UMT5Model(UMT5 model)UdopConfigconfiguration class:UdopModel(UDOP model)UniSpeechConfigconfiguration class:UniSpeechModel(UniSpeech model)UniSpeechSatConfigconfiguration class:UniSpeechSatModel(UniSpeechSat model)UnivNetConfigconfiguration class:UnivNetModel(UnivNet model)VJEPA2Configconfiguration class:VJEPA2Model(VJEPA2Model model)VanConfigconfiguration class:VanModel(VAN model)VaultGemmaConfigconfiguration class:VaultGemmaModel(VaultGemma model)ViTConfigconfiguration class:ViTModel(ViT model)ViTHybridConfigconfiguration class:ViTHybridModel(ViT Hybrid model)ViTMAEConfigconfiguration class:ViTMAEModel(ViTMAE model)ViTMSNConfigconfiguration class:ViTMSNModel(ViTMSN model)VideoLlavaConfigconfiguration class:VideoLlavaModel(VideoLlava model)VideoMAEConfigconfiguration class:VideoMAEModel(VideoMAE model)ViltConfigconfiguration class:ViltModel(ViLT model)VipLlavaConfigconfiguration class:VipLlavaModel(VipLlava model)VisionTextDualEncoderConfigconfiguration class:VisionTextDualEncoderModel(VisionTextDualEncoder model)VisualBertConfigconfiguration class:VisualBertModel(VisualBERT model)VitDetConfigconfiguration class:VitDetModel(VitDet model)VitsConfigconfiguration class:VitsModel(VITS model)VivitConfigconfiguration class:VivitModel(ViViT model)VoxtralConfigconfiguration class:VoxtralForConditionalGeneration(Voxtral model)VoxtralEncoderConfigconfiguration class:VoxtralEncoder(Voxtral Encoder model)Wav2Vec2BertConfigconfiguration class:Wav2Vec2BertModel(Wav2Vec2-BERT model)Wav2Vec2Configconfiguration class:Wav2Vec2Model(Wav2Vec2 model)Wav2Vec2ConformerConfigconfiguration class:Wav2Vec2ConformerModel(Wav2Vec2-Conformer model)WavLMConfigconfiguration class:WavLMModel(WavLM model)WhisperConfigconfiguration class:WhisperModel(Whisper model)XCLIPConfigconfiguration class:XCLIPModel(X-CLIP model)XGLMConfigconfiguration class:XGLMModel(XGLM model)XLMConfigconfiguration class:XLMModel(XLM model)XLMProphetNetConfigconfiguration class:XLMProphetNetModel(XLM-ProphetNet model)XLMRobertaConfigconfiguration class:XLMRobertaModel(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLModel(XLM-RoBERTa-XL model)XLNetConfigconfiguration class:XLNetModel(XLNet model)XcodecConfigconfiguration class:XcodecModel(X-CODEC model)XmodConfigconfiguration class:XmodModel(X-MOD model)YolosConfigconfiguration class:YolosModel(YOLOS model)YosoConfigconfiguration class:YosoModel(YOSO model)Zamba2Configconfiguration class:Zamba2Model(Zamba2 model)ZambaConfigconfiguration class:ZambaModel(Zamba model)xLSTMConfigconfiguration class:xLSTMModel(xLSTM model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the base model classes of the library from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the base model classes of the library from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- aimv2 —
Aimv2Model(AIMv2 model) - aimv2_vision_model —
Aimv2VisionModel(Aimv2VisionModel model) - albert — AlbertModel (ALBERT model)
- align — AlignModel (ALIGN model)
- altclip — AltCLIPModel (AltCLIP model)
- apertus —
ApertusModel(Apertus model) - arcee —
ArceeModel(Arcee model) - aria —
AriaModel(Aria model) - aria_text —
AriaTextModel(AriaText model) - audio-spectrogram-transformer — ASTModel (Audio Spectrogram Transformer model)
- autoformer — AutoformerModel (Autoformer model)
- aya_vision —
AyaVisionModel(AyaVision model) - bamba —
BambaModel(Bamba model) - bark — BarkModel (Bark model)
- bart — BartModel (BART model)
- beit — BeitModel (BEiT model)
- bert — BertModel (BERT model)
- bert-generation — BertGenerationEncoder (Bert Generation model)
- big_bird — BigBirdModel (BigBird model)
- bigbird_pegasus — BigBirdPegasusModel (BigBird-Pegasus model)
- biogpt — BioGptModel (BioGpt model)
- bit — BitModel (BiT model)
- bitnet —
BitNetModel(BitNet model) - blenderbot — BlenderbotModel (Blenderbot model)
- blenderbot-small — BlenderbotSmallModel (BlenderbotSmall model)
- blip — BlipModel (BLIP model)
- blip-2 — Blip2Model (BLIP-2 model)
- blip_2_qformer — Blip2QFormerModel (BLIP-2 QFormer model)
- bloom — BloomModel (BLOOM model)
- blt —
BltModel(Blt model) - bridgetower — BridgeTowerModel (BridgeTower model)
- bros — BrosModel (BROS model)
- camembert — CamembertModel (CamemBERT model)
- canine — CanineModel (CANINE model)
- chameleon —
ChameleonModel(Chameleon model) - chinese_clip — ChineseCLIPModel (Chinese-CLIP model)
- chinese_clip_vision_model — ChineseCLIPVisionModel (ChineseCLIPVisionModel model)
- clap — ClapModel (CLAP model)
- clip — CLIPModel (CLIP model)
- clip_text_model — CLIPTextModel (CLIPTextModel model)
- clip_vision_model — CLIPVisionModel (CLIPVisionModel model)
- clipseg — CLIPSegModel (CLIPSeg model)
- clvp — ClvpModelForConditionalGeneration (CLVP model)
- code_llama —
LlamaModel(CodeLlama model) - codegen — CodeGenModel (CodeGen model)
- cohere —
CohereModel(Cohere model) - cohere2 —
Cohere2Model(Cohere2 model) - cohere2_vision —
Cohere2VisionModel(Cohere2Vision model) - conditional_detr — ConditionalDetrModel (Conditional DETR model)
- convbert — ConvBertModel (ConvBERT model)
- convnext — ConvNextModel (ConvNeXT model)
- convnextv2 — ConvNextV2Model (ConvNeXTV2 model)
- cpmant — CpmAntModel (CPM-Ant model)
- csm —
CsmForConditionalGeneration(CSM model) - ctrl — CTRLModel (CTRL model)
- cvt — CvtModel (CvT model)
- d_fine —
DFineModel(D-FINE model) - dab-detr —
DabDetrModel(DAB-DETR model) - dac —
DacModel(DAC model) - data2vec-audio — Data2VecAudioModel (Data2VecAudio model)
- data2vec-text — Data2VecTextModel (Data2VecText model)
- data2vec-vision — Data2VecVisionModel (Data2VecVision model)
- dbrx —
DbrxModel(DBRX model) - deberta — DebertaModel (DeBERTa model)
- deberta-v2 — DebertaV2Model (DeBERTa-v2 model)
- decision_transformer —
DecisionTransformerModel(Decision Transformer model) - deepseek_v2 —
DeepseekV2Model(DeepSeek-V2 model) - deepseek_v3 —
DeepseekV3Model(DeepSeek-V3 model) - deepseek_vl —
DeepseekVLModel(DeepseekVL model) - deepseek_vl_hybrid —
DeepseekVLHybridModel(DeepseekVLHybrid model) - deformable_detr — DeformableDetrModel (Deformable DETR model)
- deit — DeiTModel (DeiT model)
- depth_pro —
DepthProModel(DepthPro model) - deta — DetaModel (DETA model)
- detr — DetrModel (DETR model)
- dia —
DiaModel(Dia model) - diffllama —
DiffLlamaModel(DiffLlama model) - dinat — DinatModel (DiNAT model)
- dinov2 —
Dinov2Model(DINOv2 model) - dinov2_with_registers —
Dinov2WithRegistersModel(DINOv2 with Registers model) - dinov3_convnext —
DINOv3ConvNextModel(DINOv3 ConvNext model) - dinov3_vit —
DINOv3ViTModel(DINOv3 ViT model) - distilbert —
DistilBertModel(DistilBERT model) - doge —
DogeModel(Doge model) - donut-swin —
DonutSwinModel(DonutSwin model) - dots1 —
Dots1Model(dots1 model) - dpr —
DPRQuestionEncoder(DPR model) - dpt —
DPTModel(DPT model) - edgetam —
EdgeTamModel(EdgeTAM model) - edgetam_video —
EdgeTamVideoModel(EdgeTamVideo model) - edgetam_vision_model —
EdgeTamVisionModel(EdgeTamVisionModel model) - efficientformer —
EfficientFormerModel(EfficientFormer model) - efficientloftr —
EfficientLoFTRModel(EfficientLoFTR model) - efficientnet —
EfficientNetModel(EfficientNet model) - electra —
ElectraModel(ELECTRA model) - emu3 —
Emu3Model(Emu3 model) - encodec —
EncodecModel(EnCodec model) - ernie —
ErnieModel(ERNIE model) - ernie4_5 —
Ernie4_5Model(Ernie4_5 model) - ernie4_5_moe —
Ernie4_5_MoeModel(Ernie4_5_MoE model) - ernie_m —
ErnieMModel(ErnieM model) - esm —
EsmModel(ESM model) - evolla —
EvollaModel(Evolla model) - exaone4 —
Exaone4Model(EXAONE-4.0 model) - falcon —
FalconModel(Falcon model) - falcon_h1 —
FalconH1Model(FalconH1 model) - falcon_mamba —
FalconMambaModel(FalconMamba model) - fastspeech2_conformer —
FastSpeech2ConformerModel(FastSpeech2Conformer model) - fastspeech2_conformer_with_hifigan —
FastSpeech2ConformerWithHifiGan(FastSpeech2ConformerWithHifiGan model) - flaubert —
FlaubertModel(FlauBERT model) - flava —
FlavaModel(FLAVA model) - flex_olmo —
FlexOlmoModel(FlexOlmo model) - florence2 —
Florence2Model(Florence2 model) - fnet —
FNetModel(FNet model) - focalnet —
FocalNetModel(FocalNet model) - fsmt —
FSMTModel(FairSeq Machine-Translation model) - funnel —
FunnelModelorFunnelBaseModel(Funnel Transformer model) - fuyu —
FuyuModel(Fuyu model) - gemma —
GemmaModel(Gemma model) - gemma2 —
Gemma2Model(Gemma2 model) - gemma3 —
Gemma3Model(Gemma3ForConditionalGeneration model) - gemma3_text —
Gemma3TextModel(Gemma3ForCausalLM model) - gemma3n —
Gemma3nModel(Gemma3nForConditionalGeneration model) - gemma3n_audio —
Gemma3nAudioEncoder(Gemma3nAudioEncoder model) - gemma3n_text —
Gemma3nTextModel(Gemma3nForCausalLM model) - gemma3n_vision —
TimmWrapperModel(TimmWrapperModel model) - git —
GitModel(GIT model) - glm —
GlmModel(GLM model) - glm4 —
Glm4Model(GLM4 model) - glm4_moe —
Glm4MoeModel(Glm4MoE model) - glm4v —
Glm4vModel(GLM4V model) - glm4v_moe —
Glm4vMoeModel(GLM4VMOE model) - glm4v_moe_text —
Glm4vMoeTextModel(GLM4VMOE model) - glm4v_text —
Glm4vTextModel(GLM4V model) - glpn —
GLPNModel(GLPN model) - got_ocr2 —
GotOcr2Model(GOT-OCR2 model) - gpt-sw3 —
GPT2Model(GPT-Sw3 model) - gpt2 —
GPT2Model(OpenAI GPT-2 model) - gpt_bigcode —
GPTBigCodeModel(GPTBigCode model) - gpt_neo —
GPTNeoModel(GPT Neo model) - gpt_neox —
GPTNeoXModel(GPT NeoX model) - gpt_neox_japanese —
GPTNeoXJapaneseModel(GPT NeoX Japanese model) - gpt_oss —
GptOssModel(GptOss model) - gptj —
GPTJModel(GPT-J model) - gptsan-japanese —
GPTSanJapaneseForConditionalGeneration(GPTSAN-japanese model) - granite —
GraniteModel(Granite model) - granitemoe —
GraniteMoeModel(GraniteMoeMoe model) - granitemoehybrid —
GraniteMoeHybridModel(GraniteMoeHybrid model) - granitemoeshared —
GraniteMoeSharedModel(GraniteMoeSharedMoe model) - graphormer —
GraphormerModel(Graphormer model) - grounding-dino —
GroundingDinoModel(Grounding DINO model) - groupvit —
GroupViTModel(GroupViT model) - helium —
HeliumModel(Helium model) - hgnet_v2 —
HGNetV2Backbone(HGNet-V2 model) - hiera —
HieraModel(Hiera model) - hubert —
HubertModel(Hubert model) - hunyuan_v1_dense —
HunYuanDenseV1Model(HunYuanDenseV1 model) - hunyuan_v1_moe —
HunYuanMoEV1Model(HunYuanMoeV1 model) - ibert —
IBertModel(I-BERT model) - idefics —
IdeficsModel(IDEFICS model) - idefics2 —
Idefics2Model(Idefics2 model) - idefics3 —
Idefics3Model(Idefics3 model) - idefics3_vision —
Idefics3VisionTransformer(Idefics3VisionTransformer model) - ijepa —
IJepaModel(I-JEPA model) - imagegpt —
ImageGPTModel(ImageGPT model) - informer —
InformerModel(Informer model) - instructblip —
InstructBlipModel(InstructBLIP model) - instructblipvideo —
InstructBlipVideoModel(InstructBlipVideo model) - internvl —
InternVLModel(InternVL model) - internvl_vision —
InternVLVisionModel(InternVLVision model) - jamba —
JambaModel(Jamba model) - janus —
JanusModel(Janus model) - jetmoe —
JetMoeModel(JetMoe model) - jukebox —
JukeboxModel(Jukebox model) - kosmos-2 —
Kosmos2Model(KOSMOS-2 model) - kosmos-2.5 —
Kosmos2_5Model(KOSMOS-2.5 model) - kyutai_speech_to_text —
KyutaiSpeechToTextModel(KyutaiSpeechToText model) - layoutlm —
LayoutLMModel(LayoutLM model) - layoutlmv2 —
LayoutLMv2Model(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3Model(LayoutLMv3 model) - led —
LEDModel(LED model) - levit —
LevitModel(LeViT model) - lfm2 —
Lfm2Model(Lfm2 model) - lfm2_vl —
Lfm2VlModel(Lfm2Vl model) - lightglue —
LightGlueForKeypointMatching(LightGlue model) - lilt —
LiltModel(LiLT model) - llama —
LlamaModel(LLaMA model) - llama4 —
Llama4ForConditionalGeneration(Llama4 model) - llama4_text —
Llama4TextModel(Llama4ForCausalLM model) - llava —
LlavaModel(LLaVa model) - llava_next —
LlavaNextModel(LLaVA-NeXT model) - llava_next_video —
LlavaNextVideoModel(LLaVa-NeXT-Video model) - llava_onevision —
LlavaOnevisionModel(LLaVA-Onevision model) - longcat_flash —
LongcatFlashModel(LongCatFlash model) - longformer —
LongformerModel(Longformer model) - longt5 —
LongT5Model(LongT5 model) - luke —
LukeModel(LUKE model) - lxmert —
LxmertModel(LXMERT model) - m2m_100 —
M2M100Model(M2M100 model) - mamba —
MambaModel(Mamba model) - mamba2 —
Mamba2Model(mamba2 model) - marian —
MarianModel(Marian model) - markuplm —
MarkupLMModel(MarkupLM model) - mask2former —
Mask2FormerModel(Mask2Former model) - maskformer —
MaskFormerModel(MaskFormer model) - maskformer-swin —
MaskFormerSwinModel(MaskFormerSwin model) - mbart —
MBartModel(mBART model) - mctct —
MCTCTModel(M-CTC-T model) - mega —
MegaModel(MEGA model) - megatron-bert —
MegatronBertModel(Megatron-BERT model) - metaclip_2 —
MetaClip2Model(MetaCLIP 2 model) - mgp-str —
MgpstrForSceneTextRecognition(MGP-STR model) - mimi —
MimiModel(Mimi model) - minimax —
MiniMaxModel(MiniMax model) - ministral —
MinistralModel(Ministral model) - mistral —
MistralModel(Mistral model) - mistral3 —
Mistral3Model(Mistral3 model) - mixtral —
MixtralModel(Mixtral model) - mlcd —
MLCDVisionModel(MLCD model) - mllama —
MllamaModel(Mllama model) - mm-grounding-dino —
MMGroundingDinoModel(MM Grounding DINO model) - mobilebert —
MobileBertModel(MobileBERT model) - mobilenet_v1 —
MobileNetV1Model(MobileNetV1 model) - mobilenet_v2 —
MobileNetV2Model(MobileNetV2 model) - mobilevit —
MobileViTModel(MobileViT model) - mobilevitv2 —
MobileViTV2Model(MobileViTV2 model) - modernbert —
ModernBertModel(ModernBERT model) - modernbert-decoder —
ModernBertDecoderModel(ModernBertDecoder model) - moonshine —
MoonshineModel(Moonshine model) - moshi —
MoshiModel(Moshi model) - mpnet —
MPNetModel(MPNet model) - mpt —
MptModel(MPT model) - mra —
MraModel(MRA model) - mt5 —
MT5Model(MT5 model) - musicgen —
MusicgenModel(MusicGen model) - musicgen_melody —
MusicgenMelodyModel(MusicGen Melody model) - mvp —
MvpModel(MVP model) - nat —
NatModel(NAT model) - nemotron —
NemotronModel(Nemotron model) - nezha —
NezhaModel(Nezha model) - nllb-moe —
NllbMoeModel(NLLB-MOE model) - nystromformer —
NystromformerModel(Nyströmformer model) - olmo —
OlmoModel(OLMo model) - olmo2 —
Olmo2Model(OLMo2 model) - olmo3 —
Olmo3Model(Olmo3 model) - olmoe —
OlmoeModel(OLMoE model) - omdet-turbo —
OmDetTurboForObjectDetection(OmDet-Turbo model) - oneformer —
OneFormerModel(OneFormer model) - open-llama —
OpenLlamaModel(OpenLlama model) - openai-gpt —
OpenAIGPTModel(OpenAI GPT model) - opt —
OPTModel(OPT model) - ovis2 —
Ovis2Model(Ovis2 model) - owlv2 —
Owlv2Model(OWLv2 model) - owlvit —
OwlViTModel(OWL-ViT model) - paligemma —
PaliGemmaModel(PaliGemma model) - parakeet_ctc —
ParakeetForCTC(Parakeet model) - parakeet_encoder —
ParakeetEncoder(ParakeetEncoder model) - patchtsmixer —
PatchTSMixerModel(PatchTSMixer model) - patchtst —
PatchTSTModel(PatchTST model) - pegasus —
PegasusModel(Pegasus model) - pegasus_x —
PegasusXModel(PEGASUS-X model) - perceiver —
PerceiverModel(Perceiver model) - perception_encoder —
PerceptionEncoder(PerceptionEncoder model) - perception_lm —
PerceptionLMModel(PerceptionLM model) - persimmon —
PersimmonModel(Persimmon model) - phi —
PhiModel(Phi model) - phi3 —
Phi3Model(Phi3 model) - phi4_multimodal —
Phi4MultimodalModel(Phi4Multimodal model) - phimoe —
PhimoeModel(Phimoe model) - pixtral —
PixtralVisionModel(Pixtral model) - plbart —
PLBartModel(PLBart model) - poolformer —
PoolFormerModel(PoolFormer model) - prophetnet —
ProphetNetModel(ProphetNet model) - pvt —
PvtModel(PVT model) - pvt_v2 —
PvtV2Model(PVTv2 model) - qdqbert —
QDQBertModel(QDQBert model) - qwen2 —
Qwen2Model(Qwen2 model) - qwen2_5_vl —
Qwen2_5_VLModel(Qwen2_5_VL model) - qwen2_5_vl_text —
Qwen2_5_VLTextModel(Qwen2_5_VL model) - qwen2_audio_encoder —
Qwen2AudioEncoder(Qwen2AudioEncoder model) - qwen2_moe —
Qwen2MoeModel(Qwen2MoE model) - qwen2_vl —
Qwen2VLModel(Qwen2VL model) - qwen2_vl_text —
Qwen2VLTextModel(Qwen2VL model) - qwen3 —
Qwen3Model(Qwen3 model) - qwen3_moe —
Qwen3MoeModel(Qwen3MoE model) - qwen3_next —
Qwen3NextModel(Qwen3Next model) - qwen3_vl —
Qwen3VLModel(Qwen3VL model) - qwen3_vl_moe —
Qwen3VLMoeModel(Qwen3VLMoe model) - qwen3_vl_moe_text —
Qwen3VLMoeTextModel(Qwen3VLMoe model) - qwen3_vl_text —
Qwen3VLTextModel(Qwen3VL model) - recurrent_gemma —
RecurrentGemmaModel(RecurrentGemma model) - reformer —
ReformerModel(Reformer model) - regnet —
RegNetModel(RegNet model) - rembert —
RemBertModel(RemBERT model) - resnet —
ResNetModel(ResNet model) - retribert —
RetriBertModel(RetriBERT model) - roberta —
RobertaModel(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormModel(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertModel(RoCBert model) - roformer —
RoFormerModel(RoFormer model) - rt_detr —
RTDetrModel(RT-DETR model) - rt_detr_v2 —
RTDetrV2Model(RT-DETRv2 model) - rwkv —
RwkvModel(RWKV model) - sam —
SamModel(SAM model) - sam2 —
Sam2Model(SAM2 model) - sam2_hiera_det_model —
Sam2HieraDetModel(Sam2HieraDetModel model) - sam2_video —
Sam2VideoModel(Sam2VideoModel model) - sam2_vision_model —
Sam2VisionModel(Sam2VisionModel model) - sam_hq —
SamHQModel(SAM-HQ model) - sam_hq_vision_model —
SamHQVisionModel(SamHQVisionModel model) - sam_vision_model —
SamVisionModel(SamVisionModel model) - seamless_m4t —
SeamlessM4TModel(SeamlessM4T model) - seamless_m4t_v2 —
SeamlessM4Tv2Model(SeamlessM4Tv2 model) - seed_oss —
SeedOssModel(SeedOss model) - segformer —
SegformerModel(SegFormer model) - seggpt —
SegGptModel(SegGPT model) - sew —
SEWModel(SEW model) - sew-d —
SEWDModel(SEW-D model) - siglip —
SiglipModel(SigLIP model) - siglip2 —
Siglip2Model(SigLIP2 model) - siglip2_vision_model —
Siglip2VisionModel(Siglip2VisionModel model) - siglip_vision_model —
SiglipVisionModel(SiglipVisionModel model) - smollm3 —
SmolLM3Model(SmolLM3 model) - smolvlm —
SmolVLMModel(SmolVLM model) - smolvlm_vision —
SmolVLMVisionTransformer(SmolVLMVisionTransformer model) - speech_to_text —
Speech2TextModel(Speech2Text model) - speecht5 —
SpeechT5Model(SpeechT5 model) - splinter —
SplinterModel(Splinter model) - squeezebert —
SqueezeBertModel(SqueezeBERT model) - stablelm —
StableLmModel(StableLm model) - starcoder2 —
Starcoder2Model(Starcoder2 model) - swiftformer —
SwiftFormerModel(SwiftFormer model) - swin —
SwinModel(Swin Transformer model) - swin2sr —
Swin2SRModel(Swin2SR model) - swinv2 —
Swinv2Model(Swin Transformer V2 model) - switch_transformers —
SwitchTransformersModel(SwitchTransformers model) - t5 —
T5Model(T5 model) - t5gemma —
T5GemmaModel(T5Gemma model) - table-transformer —
TableTransformerModel(Table Transformer model) - tapas —
TapasModel(TAPAS model) - textnet —
TextNetModel(TextNet model) - time_series_transformer —
TimeSeriesTransformerModel(Time Series Transformer model) - timesfm —
TimesFmModel(TimesFm model) - timesformer —
TimesformerModel(TimeSformer model) - timm_backbone —
TimmBackbone(TimmBackbone model) - timm_wrapper —
TimmWrapperModel(TimmWrapperModel model) - trajectory_transformer —
TrajectoryTransformerModel(Trajectory Transformer model) - transfo-xl —
TransfoXLModel(Transformer-XL model) - tvlt —
TvltModel(TVLT model) - tvp —
TvpModel(TVP model) - udop —
UdopModel(UDOP model) - umt5 —
UMT5Model(UMT5 model) - unispeech —
UniSpeechModel(UniSpeech model) - unispeech-sat —
UniSpeechSatModel(UniSpeechSat model) - univnet —
UnivNetModel(UnivNet model) - van —
VanModel(VAN model) - vaultgemma —
VaultGemmaModel(VaultGemma model) - video_llava —
VideoLlavaModel(VideoLlava model) - videomae —
VideoMAEModel(VideoMAE model) - vilt —
ViltModel(ViLT model) - vipllava —
VipLlavaModel(VipLlava model) - vision-text-dual-encoder —
VisionTextDualEncoderModel(VisionTextDualEncoder model) - visual_bert —
VisualBertModel(VisualBERT model) - vit —
ViTModel(ViT model) - vit_hybrid —
ViTHybridModel(ViT Hybrid model) - vit_mae —
ViTMAEModel(ViTMAE model) - vit_msn —
ViTMSNModel(ViTMSN model) - vitdet —
VitDetModel(VitDet model) - vits —
VitsModel(VITS model) - vivit —
VivitModel(ViViT model) - vjepa2 —
VJEPA2Model(VJEPA2Model model) - voxtral —
VoxtralForConditionalGeneration(Voxtral model) - voxtral_encoder —
VoxtralEncoder(Voxtral Encoder model) - wav2vec2 —
Wav2Vec2Model(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2BertModel(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2ConformerModel(Wav2Vec2-Conformer model) - wavlm —
WavLMModel(WavLM model) - whisper —
WhisperModel(Whisper model) - xclip —
XCLIPModel(X-CLIP model) - xcodec —
XcodecModel(X-CODEC model) - xglm —
XGLMModel(XGLM model) - xlm —
XLMModel(XLM model) - xlm-prophetnet —
XLMProphetNetModel(XLM-ProphetNet model) - xlm-roberta —
XLMRobertaModel(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLModel(XLM-RoBERTa-XL model) - xlnet —
XLNetModel(XLNet model) - xlstm —
xLSTMModel(xLSTM model) - xmod —
XmodModel(X-MOD model) - yolos —
YolosModel(YOLOS model) - yoso —
YosoModel(YOSO model) - zamba —
ZambaModel(Zamba model) - zamba2 —
Zamba2Model(Zamba2 model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModel.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModel.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModel
This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertModel (ALBERT model)
- BartConfig configuration class: TFBartModel (BART model)
- BertConfig configuration class: TFBertModel (BERT model)
- BlenderbotConfig configuration class: TFBlenderbotModel (Blenderbot model)
- BlenderbotSmallConfig configuration class: TFBlenderbotSmallModel (BlenderbotSmall model)
- BlipConfig configuration class: TFBlipModel (BLIP model)
- CLIPConfig configuration class: TFCLIPModel (CLIP model)
- CTRLConfig configuration class: TFCTRLModel (CTRL model)
- CamembertConfig configuration class: TFCamembertModel (CamemBERT model)
- ConvBertConfig configuration class: TFConvBertModel (ConvBERT model)
- ConvNextConfig configuration class: TFConvNextModel (ConvNeXT model)
- ConvNextV2Config configuration class: TFConvNextV2Model (ConvNeXTV2 model)
- CvtConfig configuration class: TFCvtModel (CvT model)
DPRConfigconfiguration class:TFDPRQuestionEncoder(DPR model)- Data2VecVisionConfig configuration class: TFData2VecVisionModel (Data2VecVision model)
- DebertaConfig configuration class: TFDebertaModel (DeBERTa model)
- DebertaV2Config configuration class: TFDebertaV2Model (DeBERTa-v2 model)
- DeiTConfig configuration class: TFDeiTModel (DeiT model)
DistilBertConfigconfiguration class:TFDistilBertModel(DistilBERT model)EfficientFormerConfigconfiguration class:TFEfficientFormerModel(EfficientFormer model)ElectraConfigconfiguration class:TFElectraModel(ELECTRA model)EsmConfigconfiguration class:TFEsmModel(ESM model)FlaubertConfigconfiguration class:TFFlaubertModel(FlauBERT model)FunnelConfigconfiguration class:TFFunnelModelorTFFunnelBaseModel(Funnel Transformer model)GPT2Configconfiguration class:TFGPT2Model(OpenAI GPT-2 model)GPTJConfigconfiguration class:TFGPTJModel(GPT-J model)GroupViTConfigconfiguration class:TFGroupViTModel(GroupViT model)HubertConfigconfiguration class:TFHubertModel(Hubert model)IdeficsConfigconfiguration class:TFIdeficsModel(IDEFICS model)LEDConfigconfiguration class:TFLEDModel(LED model)LayoutLMConfigconfiguration class:TFLayoutLMModel(LayoutLM model)LayoutLMv3Configconfiguration class:TFLayoutLMv3Model(LayoutLMv3 model)LongformerConfigconfiguration class:TFLongformerModel(Longformer model)LxmertConfigconfiguration class:TFLxmertModel(LXMERT model)MBartConfigconfiguration class:TFMBartModel(mBART model)MPNetConfigconfiguration class:TFMPNetModel(MPNet model)MT5Configconfiguration class:TFMT5Model(MT5 model)MarianConfigconfiguration class:TFMarianModel(Marian model)MistralConfigconfiguration class:TFMistralModel(Mistral model)MobileBertConfigconfiguration class:TFMobileBertModel(MobileBERT model)MobileViTConfigconfiguration class:TFMobileViTModel(MobileViT model)OPTConfigconfiguration class:TFOPTModel(OPT model)OpenAIGPTConfigconfiguration class:TFOpenAIGPTModel(OpenAI GPT model)PegasusConfigconfiguration class:TFPegasusModel(Pegasus model)RegNetConfigconfiguration class:TFRegNetModel(RegNet model)RemBertConfigconfiguration class:TFRemBertModel(RemBERT model)ResNetConfigconfiguration class:TFResNetModel(ResNet model)RoFormerConfigconfiguration class:TFRoFormerModel(RoFormer model)RobertaConfigconfiguration class:TFRobertaModel(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormModel(RoBERTa-PreLayerNorm model)SamConfigconfiguration class:TFSamModel(SAM model)SamVisionConfigconfiguration class:TFSamVisionModel(SamVisionModel model)SegformerConfigconfiguration class:TFSegformerModel(SegFormer model)Speech2TextConfigconfiguration class:TFSpeech2TextModel(Speech2Text model)SwiftFormerConfigconfiguration class:TFSwiftFormerModel(SwiftFormer model)SwinConfigconfiguration class:TFSwinModel(Swin Transformer model)T5Configconfiguration class:TFT5Model(T5 model)TapasConfigconfiguration class:TFTapasModel(TAPAS model)TransfoXLConfigconfiguration class:TFTransfoXLModel(Transformer-XL model)ViTConfigconfiguration class:TFViTModel(ViT model)ViTMAEConfigconfiguration class:TFViTMAEModel(ViTMAE model)VisionTextDualEncoderConfigconfiguration class:TFVisionTextDualEncoderModel(VisionTextDualEncoder model)Wav2Vec2Configconfiguration class:TFWav2Vec2Model(Wav2Vec2 model)WhisperConfigconfiguration class:TFWhisperModel(Whisper model)XGLMConfigconfiguration class:TFXGLMModel(XGLM model)XLMConfigconfiguration class:TFXLMModel(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaModel(XLM-RoBERTa model)XLNetConfigconfiguration class:TFXLNetModel(XLNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the base model classes of the library from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the base model classes of the library from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — TFAlbertModel (ALBERT model)
- bart — TFBartModel (BART model)
- bert — TFBertModel (BERT model)
- blenderbot — TFBlenderbotModel (Blenderbot model)
- blenderbot-small — TFBlenderbotSmallModel (BlenderbotSmall model)
- blip — TFBlipModel (BLIP model)
- camembert — TFCamembertModel (CamemBERT model)
- clip — TFCLIPModel (CLIP model)
- convbert — TFConvBertModel (ConvBERT model)
- convnext — TFConvNextModel (ConvNeXT model)
- convnextv2 — TFConvNextV2Model (ConvNeXTV2 model)
- ctrl — TFCTRLModel (CTRL model)
- cvt — TFCvtModel (CvT model)
- data2vec-vision — TFData2VecVisionModel (Data2VecVision model)
- deberta — TFDebertaModel (DeBERTa model)
- deberta-v2 — TFDebertaV2Model (DeBERTa-v2 model)
- deit — TFDeiTModel (DeiT model)
- distilbert —
TFDistilBertModel(DistilBERT model) - dpr —
TFDPRQuestionEncoder(DPR model) - efficientformer —
TFEfficientFormerModel(EfficientFormer model) - electra —
TFElectraModel(ELECTRA model) - esm —
TFEsmModel(ESM model) - flaubert —
TFFlaubertModel(FlauBERT model) - funnel —
TFFunnelModelorTFFunnelBaseModel(Funnel Transformer model) - gpt-sw3 —
TFGPT2Model(GPT-Sw3 model) - gpt2 —
TFGPT2Model(OpenAI GPT-2 model) - gptj —
TFGPTJModel(GPT-J model) - groupvit —
TFGroupViTModel(GroupViT model) - hubert —
TFHubertModel(Hubert model) - idefics —
TFIdeficsModel(IDEFICS model) - layoutlm —
TFLayoutLMModel(LayoutLM model) - layoutlmv3 —
TFLayoutLMv3Model(LayoutLMv3 model) - led —
TFLEDModel(LED model) - longformer —
TFLongformerModel(Longformer model) - lxmert —
TFLxmertModel(LXMERT model) - marian —
TFMarianModel(Marian model) - mbart —
TFMBartModel(mBART model) - mistral —
TFMistralModel(Mistral model) - mobilebert —
TFMobileBertModel(MobileBERT model) - mobilevit —
TFMobileViTModel(MobileViT model) - mpnet —
TFMPNetModel(MPNet model) - mt5 —
TFMT5Model(MT5 model) - openai-gpt —
TFOpenAIGPTModel(OpenAI GPT model) - opt —
TFOPTModel(OPT model) - pegasus —
TFPegasusModel(Pegasus model) - regnet —
TFRegNetModel(RegNet model) - rembert —
TFRemBertModel(RemBERT model) - resnet —
TFResNetModel(ResNet model) - roberta —
TFRobertaModel(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormModel(RoBERTa-PreLayerNorm model) - roformer —
TFRoFormerModel(RoFormer model) - sam —
TFSamModel(SAM model) - sam_vision_model —
TFSamVisionModel(SamVisionModel model) - segformer —
TFSegformerModel(SegFormer model) - speech_to_text —
TFSpeech2TextModel(Speech2Text model) - swiftformer —
TFSwiftFormerModel(SwiftFormer model) - swin —
TFSwinModel(Swin Transformer model) - t5 —
TFT5Model(T5 model) - tapas —
TFTapasModel(TAPAS model) - transfo-xl —
TFTransfoXLModel(Transformer-XL model) - vision-text-dual-encoder —
TFVisionTextDualEncoderModel(VisionTextDualEncoder model) - vit —
TFViTModel(ViT model) - vit_mae —
TFViTMAEModel(ViTMAE model) - wav2vec2 —
TFWav2Vec2Model(Wav2Vec2 model) - whisper —
TFWhisperModel(Whisper model) - xglm —
TFXGLMModel(XGLM model) - xlm —
TFXLMModel(XLM model) - xlm-roberta —
TFXLMRobertaModel(XLM-RoBERTa model) - xlnet —
TFXLNetModel(XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModel.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModel
This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertModel (ALBERT model)
- BartConfig configuration class: FlaxBartModel (BART model)
- BeitConfig configuration class: FlaxBeitModel (BEiT model)
- BertConfig configuration class: FlaxBertModel (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdModel (BigBird model)
- BlenderbotConfig configuration class: FlaxBlenderbotModel (Blenderbot model)
- BlenderbotSmallConfig configuration class: FlaxBlenderbotSmallModel (BlenderbotSmall model)
- BloomConfig configuration class: FlaxBloomModel (BLOOM model)
- CLIPConfig configuration class: FlaxCLIPModel (CLIP model)
Dinov2Configconfiguration class:FlaxDinov2Model(DINOv2 model)DistilBertConfigconfiguration class:FlaxDistilBertModel(DistilBERT model)ElectraConfigconfiguration class:FlaxElectraModel(ELECTRA model)GPT2Configconfiguration class:FlaxGPT2Model(OpenAI GPT-2 model)GPTJConfigconfiguration class:FlaxGPTJModel(GPT-J model)GPTNeoConfigconfiguration class:FlaxGPTNeoModel(GPT Neo model)GemmaConfigconfiguration class:FlaxGemmaModel(Gemma model)LlamaConfigconfiguration class:FlaxLlamaModel(LLaMA model)LongT5Configconfiguration class:FlaxLongT5Model(LongT5 model)MBartConfigconfiguration class:FlaxMBartModel(mBART model)MT5Configconfiguration class:FlaxMT5Model(MT5 model)MarianConfigconfiguration class:FlaxMarianModel(Marian model)MistralConfigconfiguration class:FlaxMistralModel(Mistral model)OPTConfigconfiguration class:FlaxOPTModel(OPT model)PegasusConfigconfiguration class:FlaxPegasusModel(Pegasus model)RegNetConfigconfiguration class:FlaxRegNetModel(RegNet model)ResNetConfigconfiguration class:FlaxResNetModel(ResNet model)RoFormerConfigconfiguration class:FlaxRoFormerModel(RoFormer model)RobertaConfigconfiguration class:FlaxRobertaModel(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormModel(RoBERTa-PreLayerNorm model)T5Configconfiguration class:FlaxT5Model(T5 model)ViTConfigconfiguration class:FlaxViTModel(ViT model)VisionTextDualEncoderConfigconfiguration class:FlaxVisionTextDualEncoderModel(VisionTextDualEncoder model)Wav2Vec2Configconfiguration class:FlaxWav2Vec2Model(Wav2Vec2 model)WhisperConfigconfiguration class:FlaxWhisperModel(Whisper model)XGLMConfigconfiguration class:FlaxXGLMModel(XGLM model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaModel(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the base model classes of the library from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the base model classes of the library from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — FlaxAlbertModel (ALBERT model)
- bart — FlaxBartModel (BART model)
- beit — FlaxBeitModel (BEiT model)
- bert — FlaxBertModel (BERT model)
- big_bird — FlaxBigBirdModel (BigBird model)
- blenderbot — FlaxBlenderbotModel (Blenderbot model)
- blenderbot-small — FlaxBlenderbotSmallModel (BlenderbotSmall model)
- bloom — FlaxBloomModel (BLOOM model)
- clip — FlaxCLIPModel (CLIP model)
- dinov2 —
FlaxDinov2Model(DINOv2 model) - distilbert —
FlaxDistilBertModel(DistilBERT model) - electra —
FlaxElectraModel(ELECTRA model) - gemma —
FlaxGemmaModel(Gemma model) - gpt-sw3 —
FlaxGPT2Model(GPT-Sw3 model) - gpt2 —
FlaxGPT2Model(OpenAI GPT-2 model) - gpt_neo —
FlaxGPTNeoModel(GPT Neo model) - gptj —
FlaxGPTJModel(GPT-J model) - llama —
FlaxLlamaModel(LLaMA model) - longt5 —
FlaxLongT5Model(LongT5 model) - marian —
FlaxMarianModel(Marian model) - mbart —
FlaxMBartModel(mBART model) - mistral —
FlaxMistralModel(Mistral model) - mt5 —
FlaxMT5Model(MT5 model) - opt —
FlaxOPTModel(OPT model) - pegasus —
FlaxPegasusModel(Pegasus model) - regnet —
FlaxRegNetModel(RegNet model) - resnet —
FlaxResNetModel(ResNet model) - roberta —
FlaxRobertaModel(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormModel(RoBERTa-PreLayerNorm model) - roformer —
FlaxRoFormerModel(RoFormer model) - t5 —
FlaxT5Model(T5 model) - vision-text-dual-encoder —
FlaxVisionTextDualEncoderModel(VisionTextDualEncoder model) - vit —
FlaxViTModel(ViT model) - wav2vec2 —
FlaxWav2Vec2Model(Wav2Vec2 model) - whisper —
FlaxWhisperModel(Whisper model) - xglm —
FlaxXGLMModel(XGLM model) - xlm-roberta —
FlaxXLMRobertaModel(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModel.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )Generic pretraining classes
以下の自動クラスは、事前学習ヘッドを持つモデルをインスタンス化するために利用可能です。
AutoModelForPreTraining
This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: AlbertForPreTraining (ALBERT model)
- BartConfig configuration class: BartForConditionalGeneration (BART model)
- BertConfig configuration class: BertForPreTraining (BERT model)
- BigBirdConfig configuration class: BigBirdForPreTraining (BigBird model)
- BloomConfig configuration class: BloomForCausalLM (BLOOM model)
- CTRLConfig configuration class: CTRLLMHeadModel (CTRL model)
- CamembertConfig configuration class: CamembertForMaskedLM (CamemBERT model)
ColPaliConfigconfiguration class:ColPaliForRetrieval(ColPali model)ColQwen2Configconfiguration class:ColQwen2ForRetrieval(ColQwen2 model)- Data2VecTextConfig configuration class: Data2VecTextForMaskedLM (Data2VecText model)
- DebertaConfig configuration class: DebertaForMaskedLM (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2ForMaskedLM (DeBERTa-v2 model)
DistilBertConfigconfiguration class:DistilBertForMaskedLM(DistilBERT model)ElectraConfigconfiguration class:ElectraForPreTraining(ELECTRA model)ErnieConfigconfiguration class:ErnieForPreTraining(ERNIE model)EvollaConfigconfiguration class:EvollaForProteinText2Text(Evolla model)Exaone4Configconfiguration class:Exaone4ForCausalLM(EXAONE-4.0 model)FNetConfigconfiguration class:FNetForPreTraining(FNet model)FSMTConfigconfiguration class:FSMTForConditionalGeneration(FairSeq Machine-Translation model)FalconMambaConfigconfiguration class:FalconMambaForCausalLM(FalconMamba model)FlaubertConfigconfiguration class:FlaubertWithLMHeadModel(FlauBERT model)FlavaConfigconfiguration class:FlavaForPreTraining(FLAVA model)Florence2Configconfiguration class:Florence2ForConditionalGeneration(Florence2 model)FunnelConfigconfiguration class:FunnelForPreTraining(Funnel Transformer model)GPT2Configconfiguration class:GPT2LMHeadModel(OpenAI GPT-2 model)GPTBigCodeConfigconfiguration class:GPTBigCodeForCausalLM(GPTBigCode model)GPTSanJapaneseConfigconfiguration class:GPTSanJapaneseForConditionalGeneration(GPTSAN-japanese model)Gemma3Configconfiguration class:Gemma3ForConditionalGeneration(Gemma3ForConditionalGeneration model)HieraConfigconfiguration class:HieraForPreTraining(Hiera model)IBertConfigconfiguration class:IBertForMaskedLM(I-BERT model)Idefics2Configconfiguration class:Idefics2ForConditionalGeneration(Idefics2 model)Idefics3Configconfiguration class:Idefics3ForConditionalGeneration(Idefics3 model)IdeficsConfigconfiguration class:IdeficsForVisionText2Text(IDEFICS model)JanusConfigconfiguration class:JanusForConditionalGeneration(Janus model)LayoutLMConfigconfiguration class:LayoutLMForMaskedLM(LayoutLM model)LlavaConfigconfiguration class:LlavaForConditionalGeneration(LLaVa model)LlavaNextConfigconfiguration class:LlavaNextForConditionalGeneration(LLaVA-NeXT model)LlavaNextVideoConfigconfiguration class:LlavaNextVideoForConditionalGeneration(LLaVa-NeXT-Video model)LlavaOnevisionConfigconfiguration class:LlavaOnevisionForConditionalGeneration(LLaVA-Onevision model)LongformerConfigconfiguration class:LongformerForMaskedLM(Longformer model)LukeConfigconfiguration class:LukeForMaskedLM(LUKE model)LxmertConfigconfiguration class:LxmertForPreTraining(LXMERT model)MPNetConfigconfiguration class:MPNetForMaskedLM(MPNet model)Mamba2Configconfiguration class:Mamba2ForCausalLM(mamba2 model)MambaConfigconfiguration class:MambaForCausalLM(Mamba model)MegaConfigconfiguration class:MegaForMaskedLM(MEGA model)MegatronBertConfigconfiguration class:MegatronBertForPreTraining(Megatron-BERT model)Mistral3Configconfiguration class:Mistral3ForConditionalGeneration(Mistral3 model)MllamaConfigconfiguration class:MllamaForConditionalGeneration(Mllama model)MobileBertConfigconfiguration class:MobileBertForPreTraining(MobileBERT model)MptConfigconfiguration class:MptForCausalLM(MPT model)MraConfigconfiguration class:MraForMaskedLM(MRA model)MvpConfigconfiguration class:MvpForConditionalGeneration(MVP model)NezhaConfigconfiguration class:NezhaForPreTraining(Nezha model)NllbMoeConfigconfiguration class:NllbMoeForConditionalGeneration(NLLB-MOE model)OpenAIGPTConfigconfiguration class:OpenAIGPTLMHeadModel(OpenAI GPT model)PaliGemmaConfigconfiguration class:PaliGemmaForConditionalGeneration(PaliGemma model)Qwen2AudioConfigconfiguration class:Qwen2AudioForConditionalGeneration(Qwen2Audio model)RetriBertConfigconfiguration class:RetriBertModel(RetriBERT model)RoCBertConfigconfiguration class:RoCBertForPreTraining(RoCBert model)RobertaConfigconfiguration class:RobertaForMaskedLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model)RwkvConfigconfiguration class:RwkvForCausalLM(RWKV model)SplinterConfigconfiguration class:SplinterForPreTraining(Splinter model)SqueezeBertConfigconfiguration class:SqueezeBertForMaskedLM(SqueezeBERT model)SwitchTransformersConfigconfiguration class:SwitchTransformersForConditionalGeneration(SwitchTransformers model)T5Configconfiguration class:T5ForConditionalGeneration(T5 model)T5GemmaConfigconfiguration class:T5GemmaForConditionalGeneration(T5Gemma model)TapasConfigconfiguration class:TapasForMaskedLM(TAPAS model)TransfoXLConfigconfiguration class:TransfoXLLMHeadModel(Transformer-XL model)TvltConfigconfiguration class:TvltForPreTraining(TVLT model)UniSpeechConfigconfiguration class:UniSpeechForPreTraining(UniSpeech model)UniSpeechSatConfigconfiguration class:UniSpeechSatForPreTraining(UniSpeechSat model)ViTMAEConfigconfiguration class:ViTMAEForPreTraining(ViTMAE model)VideoLlavaConfigconfiguration class:VideoLlavaForConditionalGeneration(VideoLlava model)VideoMAEConfigconfiguration class:VideoMAEForPreTraining(VideoMAE model)VipLlavaConfigconfiguration class:VipLlavaForConditionalGeneration(VipLlava model)VisualBertConfigconfiguration class:VisualBertForPreTraining(VisualBERT model)VoxtralConfigconfiguration class:VoxtralForConditionalGeneration(Voxtral model)Wav2Vec2Configconfiguration class:Wav2Vec2ForPreTraining(Wav2Vec2 model)Wav2Vec2ConformerConfigconfiguration class:Wav2Vec2ConformerForPreTraining(Wav2Vec2-Conformer model)XLMConfigconfiguration class:XLMWithLMHeadModel(XLM model)XLMRobertaConfigconfiguration class:XLMRobertaForMaskedLM(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLForMaskedLM(XLM-RoBERTa-XL model)XLNetConfigconfiguration class:XLNetLMHeadModel(XLNet model)XmodConfigconfiguration class:XmodForMaskedLM(X-MOD model)xLSTMConfigconfiguration class:xLSTMForCausalLM(xLSTM model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a pretraining head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — AlbertForPreTraining (ALBERT model)
- bart — BartForConditionalGeneration (BART model)
- bert — BertForPreTraining (BERT model)
- big_bird — BigBirdForPreTraining (BigBird model)
- bloom — BloomForCausalLM (BLOOM model)
- camembert — CamembertForMaskedLM (CamemBERT model)
- colpali —
ColPaliForRetrieval(ColPali model) - colqwen2 —
ColQwen2ForRetrieval(ColQwen2 model) - ctrl — CTRLLMHeadModel (CTRL model)
- data2vec-text — Data2VecTextForMaskedLM (Data2VecText model)
- deberta — DebertaForMaskedLM (DeBERTa model)
- deberta-v2 — DebertaV2ForMaskedLM (DeBERTa-v2 model)
- distilbert —
DistilBertForMaskedLM(DistilBERT model) - electra —
ElectraForPreTraining(ELECTRA model) - ernie —
ErnieForPreTraining(ERNIE model) - evolla —
EvollaForProteinText2Text(Evolla model) - exaone4 —
Exaone4ForCausalLM(EXAONE-4.0 model) - falcon_mamba —
FalconMambaForCausalLM(FalconMamba model) - flaubert —
FlaubertWithLMHeadModel(FlauBERT model) - flava —
FlavaForPreTraining(FLAVA model) - florence2 —
Florence2ForConditionalGeneration(Florence2 model) - fnet —
FNetForPreTraining(FNet model) - fsmt —
FSMTForConditionalGeneration(FairSeq Machine-Translation model) - funnel —
FunnelForPreTraining(Funnel Transformer model) - gemma3 —
Gemma3ForConditionalGeneration(Gemma3ForConditionalGeneration model) - gpt-sw3 —
GPT2LMHeadModel(GPT-Sw3 model) - gpt2 —
GPT2LMHeadModel(OpenAI GPT-2 model) - gpt_bigcode —
GPTBigCodeForCausalLM(GPTBigCode model) - gptsan-japanese —
GPTSanJapaneseForConditionalGeneration(GPTSAN-japanese model) - hiera —
HieraForPreTraining(Hiera model) - ibert —
IBertForMaskedLM(I-BERT model) - idefics —
IdeficsForVisionText2Text(IDEFICS model) - idefics2 —
Idefics2ForConditionalGeneration(Idefics2 model) - idefics3 —
Idefics3ForConditionalGeneration(Idefics3 model) - janus —
JanusForConditionalGeneration(Janus model) - layoutlm —
LayoutLMForMaskedLM(LayoutLM model) - llava —
LlavaForConditionalGeneration(LLaVa model) - llava_next —
LlavaNextForConditionalGeneration(LLaVA-NeXT model) - llava_next_video —
LlavaNextVideoForConditionalGeneration(LLaVa-NeXT-Video model) - llava_onevision —
LlavaOnevisionForConditionalGeneration(LLaVA-Onevision model) - longformer —
LongformerForMaskedLM(Longformer model) - luke —
LukeForMaskedLM(LUKE model) - lxmert —
LxmertForPreTraining(LXMERT model) - mamba —
MambaForCausalLM(Mamba model) - mamba2 —
Mamba2ForCausalLM(mamba2 model) - mega —
MegaForMaskedLM(MEGA model) - megatron-bert —
MegatronBertForPreTraining(Megatron-BERT model) - mistral3 —
Mistral3ForConditionalGeneration(Mistral3 model) - mllama —
MllamaForConditionalGeneration(Mllama model) - mobilebert —
MobileBertForPreTraining(MobileBERT model) - mpnet —
MPNetForMaskedLM(MPNet model) - mpt —
MptForCausalLM(MPT model) - mra —
MraForMaskedLM(MRA model) - mvp —
MvpForConditionalGeneration(MVP model) - nezha —
NezhaForPreTraining(Nezha model) - nllb-moe —
NllbMoeForConditionalGeneration(NLLB-MOE model) - openai-gpt —
OpenAIGPTLMHeadModel(OpenAI GPT model) - paligemma —
PaliGemmaForConditionalGeneration(PaliGemma model) - qwen2_audio —
Qwen2AudioForConditionalGeneration(Qwen2Audio model) - retribert —
RetriBertModel(RetriBERT model) - roberta —
RobertaForMaskedLM(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertForPreTraining(RoCBert model) - rwkv —
RwkvForCausalLM(RWKV model) - splinter —
SplinterForPreTraining(Splinter model) - squeezebert —
SqueezeBertForMaskedLM(SqueezeBERT model) - switch_transformers —
SwitchTransformersForConditionalGeneration(SwitchTransformers model) - t5 —
T5ForConditionalGeneration(T5 model) - t5gemma —
T5GemmaForConditionalGeneration(T5Gemma model) - tapas —
TapasForMaskedLM(TAPAS model) - transfo-xl —
TransfoXLLMHeadModel(Transformer-XL model) - tvlt —
TvltForPreTraining(TVLT model) - unispeech —
UniSpeechForPreTraining(UniSpeech model) - unispeech-sat —
UniSpeechSatForPreTraining(UniSpeechSat model) - video_llava —
VideoLlavaForConditionalGeneration(VideoLlava model) - videomae —
VideoMAEForPreTraining(VideoMAE model) - vipllava —
VipLlavaForConditionalGeneration(VipLlava model) - visual_bert —
VisualBertForPreTraining(VisualBERT model) - vit_mae —
ViTMAEForPreTraining(ViTMAE model) - voxtral —
VoxtralForConditionalGeneration(Voxtral model) - wav2vec2 —
Wav2Vec2ForPreTraining(Wav2Vec2 model) - wav2vec2-conformer —
Wav2Vec2ConformerForPreTraining(Wav2Vec2-Conformer model) - xlm —
XLMWithLMHeadModel(XLM model) - xlm-roberta —
XLMRobertaForMaskedLM(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLForMaskedLM(XLM-RoBERTa-XL model) - xlnet —
XLNetLMHeadModel(XLNet model) - xlstm —
xLSTMForCausalLM(xLSTM model) - xmod —
XmodForMaskedLM(X-MOD model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForPreTraining
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForPreTraining.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForPreTraining
This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertForPreTraining (ALBERT model)
- BartConfig configuration class: TFBartForConditionalGeneration (BART model)
- BertConfig configuration class: TFBertForPreTraining (BERT model)
- CTRLConfig configuration class: TFCTRLLMHeadModel (CTRL model)
- CamembertConfig configuration class: TFCamembertForMaskedLM (CamemBERT model)
DistilBertConfigconfiguration class:TFDistilBertForMaskedLM(DistilBERT model)ElectraConfigconfiguration class:TFElectraForPreTraining(ELECTRA model)FlaubertConfigconfiguration class:TFFlaubertWithLMHeadModel(FlauBERT model)FunnelConfigconfiguration class:TFFunnelForPreTraining(Funnel Transformer model)GPT2Configconfiguration class:TFGPT2LMHeadModel(OpenAI GPT-2 model)IdeficsConfigconfiguration class:TFIdeficsForVisionText2Text(IDEFICS model)LayoutLMConfigconfiguration class:TFLayoutLMForMaskedLM(LayoutLM model)LxmertConfigconfiguration class:TFLxmertForPreTraining(LXMERT model)MPNetConfigconfiguration class:TFMPNetForMaskedLM(MPNet model)MobileBertConfigconfiguration class:TFMobileBertForPreTraining(MobileBERT model)OpenAIGPTConfigconfiguration class:TFOpenAIGPTLMHeadModel(OpenAI GPT model)RobertaConfigconfiguration class:TFRobertaForMaskedLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model)T5Configconfiguration class:TFT5ForConditionalGeneration(T5 model)TapasConfigconfiguration class:TFTapasForMaskedLM(TAPAS model)TransfoXLConfigconfiguration class:TFTransfoXLLMHeadModel(Transformer-XL model)ViTMAEConfigconfiguration class:TFViTMAEForPreTraining(ViTMAE model)XLMConfigconfiguration class:TFXLMWithLMHeadModel(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaForMaskedLM(XLM-RoBERTa model)XLNetConfigconfiguration class:TFXLNetLMHeadModel(XLNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a pretraining head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — TFAlbertForPreTraining (ALBERT model)
- bart — TFBartForConditionalGeneration (BART model)
- bert — TFBertForPreTraining (BERT model)
- camembert — TFCamembertForMaskedLM (CamemBERT model)
- ctrl — TFCTRLLMHeadModel (CTRL model)
- distilbert —
TFDistilBertForMaskedLM(DistilBERT model) - electra —
TFElectraForPreTraining(ELECTRA model) - flaubert —
TFFlaubertWithLMHeadModel(FlauBERT model) - funnel —
TFFunnelForPreTraining(Funnel Transformer model) - gpt-sw3 —
TFGPT2LMHeadModel(GPT-Sw3 model) - gpt2 —
TFGPT2LMHeadModel(OpenAI GPT-2 model) - idefics —
TFIdeficsForVisionText2Text(IDEFICS model) - layoutlm —
TFLayoutLMForMaskedLM(LayoutLM model) - lxmert —
TFLxmertForPreTraining(LXMERT model) - mobilebert —
TFMobileBertForPreTraining(MobileBERT model) - mpnet —
TFMPNetForMaskedLM(MPNet model) - openai-gpt —
TFOpenAIGPTLMHeadModel(OpenAI GPT model) - roberta —
TFRobertaForMaskedLM(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model) - t5 —
TFT5ForConditionalGeneration(T5 model) - tapas —
TFTapasForMaskedLM(TAPAS model) - transfo-xl —
TFTransfoXLLMHeadModel(Transformer-XL model) - vit_mae —
TFViTMAEForPreTraining(ViTMAE model) - xlm —
TFXLMWithLMHeadModel(XLM model) - xlm-roberta —
TFXLMRobertaForMaskedLM(XLM-RoBERTa model) - xlnet —
TFXLNetLMHeadModel(XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForPreTraining
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForPreTraining.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForPreTraining
This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertForPreTraining (ALBERT model)
- BartConfig configuration class: FlaxBartForConditionalGeneration (BART model)
- BertConfig configuration class: FlaxBertForPreTraining (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForPreTraining (BigBird model)
ElectraConfigconfiguration class:FlaxElectraForPreTraining(ELECTRA model)LongT5Configconfiguration class:FlaxLongT5ForConditionalGeneration(LongT5 model)MBartConfigconfiguration class:FlaxMBartForConditionalGeneration(mBART model)MT5Configconfiguration class:FlaxMT5ForConditionalGeneration(MT5 model)RoFormerConfigconfiguration class:FlaxRoFormerForMaskedLM(RoFormer model)RobertaConfigconfiguration class:FlaxRobertaForMaskedLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model)T5Configconfiguration class:FlaxT5ForConditionalGeneration(T5 model)Wav2Vec2Configconfiguration class:FlaxWav2Vec2ForPreTraining(Wav2Vec2 model)WhisperConfigconfiguration class:FlaxWhisperForConditionalGeneration(Whisper model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaForMaskedLM(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a pretraining head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — FlaxAlbertForPreTraining (ALBERT model)
- bart — FlaxBartForConditionalGeneration (BART model)
- bert — FlaxBertForPreTraining (BERT model)
- big_bird — FlaxBigBirdForPreTraining (BigBird model)
- electra —
FlaxElectraForPreTraining(ELECTRA model) - longt5 —
FlaxLongT5ForConditionalGeneration(LongT5 model) - mbart —
FlaxMBartForConditionalGeneration(mBART model) - mt5 —
FlaxMT5ForConditionalGeneration(MT5 model) - roberta —
FlaxRobertaForMaskedLM(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model) - roformer —
FlaxRoFormerForMaskedLM(RoFormer model) - t5 —
FlaxT5ForConditionalGeneration(T5 model) - wav2vec2 —
FlaxWav2Vec2ForPreTraining(Wav2Vec2 model) - whisper —
FlaxWhisperForConditionalGeneration(Whisper model) - xlm-roberta —
FlaxXLMRobertaForMaskedLM(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForPreTraining
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForPreTraining.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )Natural Language Processing
以下の自動クラスは、次の自然言語処理タスクに利用可能です。
AutoModelForCausalLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
ApertusConfigconfiguration class:ApertusForCausalLM(Apertus model)ArceeConfigconfiguration class:ArceeForCausalLM(Arcee model)AriaTextConfigconfiguration class:AriaTextForCausalLM(AriaText model)BambaConfigconfiguration class:BambaForCausalLM(Bamba model)- BartConfig configuration class: BartForCausalLM (BART model)
- BertConfig configuration class: BertLMHeadModel (BERT model)
- BertGenerationConfig configuration class: BertGenerationDecoder (Bert Generation model)
- BigBirdConfig configuration class: BigBirdForCausalLM (BigBird model)
- BigBirdPegasusConfig configuration class: BigBirdPegasusForCausalLM (BigBird-Pegasus model)
- BioGptConfig configuration class: BioGptForCausalLM (BioGpt model)
BitNetConfigconfiguration class:BitNetForCausalLM(BitNet model)- BlenderbotConfig configuration class: BlenderbotForCausalLM (Blenderbot model)
- BlenderbotSmallConfig configuration class: BlenderbotSmallForCausalLM (BlenderbotSmall model)
- BloomConfig configuration class: BloomForCausalLM (BLOOM model)
BltConfigconfiguration class:BltForCausalLM(Blt model)- CTRLConfig configuration class: CTRLLMHeadModel (CTRL model)
- CamembertConfig configuration class: CamembertForCausalLM (CamemBERT model)
- CodeGenConfig configuration class: CodeGenForCausalLM (CodeGen model)
Cohere2Configconfiguration class:Cohere2ForCausalLM(Cohere2 model)CohereConfigconfiguration class:CohereForCausalLM(Cohere model)- CpmAntConfig configuration class: CpmAntForCausalLM (CPM-Ant model)
- Data2VecTextConfig configuration class: Data2VecTextForCausalLM (Data2VecText model)
DbrxConfigconfiguration class:DbrxForCausalLM(DBRX model)DeepseekV2Configconfiguration class:DeepseekV2ForCausalLM(DeepSeek-V2 model)DeepseekV3Configconfiguration class:DeepseekV3ForCausalLM(DeepSeek-V3 model)DiffLlamaConfigconfiguration class:DiffLlamaForCausalLM(DiffLlama model)DogeConfigconfiguration class:DogeForCausalLM(Doge model)Dots1Configconfiguration class:Dots1ForCausalLM(dots1 model)ElectraConfigconfiguration class:ElectraForCausalLM(ELECTRA model)Emu3Configconfiguration class:Emu3ForCausalLM(Emu3 model)Ernie4_5Configconfiguration class:Ernie4_5ForCausalLM(Ernie4_5 model)Ernie4_5_MoeConfigconfiguration class:Ernie4_5_MoeForCausalLM(Ernie4_5_MoE model)ErnieConfigconfiguration class:ErnieForCausalLM(ERNIE model)Exaone4Configconfiguration class:Exaone4ForCausalLM(EXAONE-4.0 model)FalconConfigconfiguration class:FalconForCausalLM(Falcon model)FalconH1Configconfiguration class:FalconH1ForCausalLM(FalconH1 model)FalconMambaConfigconfiguration class:FalconMambaForCausalLM(FalconMamba model)FlexOlmoConfigconfiguration class:FlexOlmoForCausalLM(FlexOlmo model)FuyuConfigconfiguration class:FuyuForCausalLM(Fuyu model)GPT2Configconfiguration class:GPT2LMHeadModel(OpenAI GPT-2 model)GPTBigCodeConfigconfiguration class:GPTBigCodeForCausalLM(GPTBigCode model)GPTJConfigconfiguration class:GPTJForCausalLM(GPT-J model)GPTNeoConfigconfiguration class:GPTNeoForCausalLM(GPT Neo model)GPTNeoXConfigconfiguration class:GPTNeoXForCausalLM(GPT NeoX model)GPTNeoXJapaneseConfigconfiguration class:GPTNeoXJapaneseForCausalLM(GPT NeoX Japanese model)Gemma2Configconfiguration class:Gemma2ForCausalLM(Gemma2 model)Gemma3Configconfiguration class:Gemma3ForConditionalGeneration(Gemma3ForConditionalGeneration model)Gemma3TextConfigconfiguration class:Gemma3ForCausalLM(Gemma3ForCausalLM model)Gemma3nConfigconfiguration class:Gemma3nForConditionalGeneration(Gemma3nForConditionalGeneration model)Gemma3nTextConfigconfiguration class:Gemma3nForCausalLM(Gemma3nForCausalLM model)GemmaConfigconfiguration class:GemmaForCausalLM(Gemma model)GitConfigconfiguration class:GitForCausalLM(GIT model)Glm4Configconfiguration class:Glm4ForCausalLM(GLM4 model)Glm4MoeConfigconfiguration class:Glm4MoeForCausalLM(Glm4MoE model)GlmConfigconfiguration class:GlmForCausalLM(GLM model)GotOcr2Configconfiguration class:GotOcr2ForConditionalGeneration(GOT-OCR2 model)GptOssConfigconfiguration class:GptOssForCausalLM(GptOss model)GraniteConfigconfiguration class:GraniteForCausalLM(Granite model)GraniteMoeConfigconfiguration class:GraniteMoeForCausalLM(GraniteMoeMoe model)GraniteMoeHybridConfigconfiguration class:GraniteMoeHybridForCausalLM(GraniteMoeHybrid model)GraniteMoeSharedConfigconfiguration class:GraniteMoeSharedForCausalLM(GraniteMoeSharedMoe model)HeliumConfigconfiguration class:HeliumForCausalLM(Helium model)HunYuanDenseV1Configconfiguration class:HunYuanDenseV1ForCausalLM(HunYuanDenseV1 model)HunYuanMoEV1Configconfiguration class:HunYuanMoEV1ForCausalLM(HunYuanMoeV1 model)JambaConfigconfiguration class:JambaForCausalLM(Jamba model)JetMoeConfigconfiguration class:JetMoeForCausalLM(JetMoe model)Lfm2Configconfiguration class:Lfm2ForCausalLM(Lfm2 model)Llama4Configconfiguration class:Llama4ForCausalLM(Llama4 model)Llama4TextConfigconfiguration class:Llama4ForCausalLM(Llama4ForCausalLM model)LlamaConfigconfiguration class:LlamaForCausalLM(LLaMA model)LongcatFlashConfigconfiguration class:LongcatFlashForCausalLM(LongCatFlash model)MBartConfigconfiguration class:MBartForCausalLM(mBART model)Mamba2Configconfiguration class:Mamba2ForCausalLM(mamba2 model)MambaConfigconfiguration class:MambaForCausalLM(Mamba model)MarianConfigconfiguration class:MarianForCausalLM(Marian model)MegaConfigconfiguration class:MegaForCausalLM(MEGA model)MegatronBertConfigconfiguration class:MegatronBertForCausalLM(Megatron-BERT model)MiniMaxConfigconfiguration class:MiniMaxForCausalLM(MiniMax model)MinistralConfigconfiguration class:MinistralForCausalLM(Ministral model)MistralConfigconfiguration class:MistralForCausalLM(Mistral model)MixtralConfigconfiguration class:MixtralForCausalLM(Mixtral model)MllamaConfigconfiguration class:MllamaForCausalLM(Mllama model)ModernBertDecoderConfigconfiguration class:ModernBertDecoderForCausalLM(ModernBertDecoder model)MoshiConfigconfiguration class:MoshiForCausalLM(Moshi model)MptConfigconfiguration class:MptForCausalLM(MPT model)MusicgenConfigconfiguration class:MusicgenForCausalLM(MusicGen model)MusicgenMelodyConfigconfiguration class:MusicgenMelodyForCausalLM(MusicGen Melody model)MvpConfigconfiguration class:MvpForCausalLM(MVP model)NemotronConfigconfiguration class:NemotronForCausalLM(Nemotron model)OPTConfigconfiguration class:OPTForCausalLM(OPT model)Olmo2Configconfiguration class:Olmo2ForCausalLM(OLMo2 model)Olmo3Configconfiguration class:Olmo3ForCausalLM(Olmo3 model)OlmoConfigconfiguration class:OlmoForCausalLM(OLMo model)OlmoeConfigconfiguration class:OlmoeForCausalLM(OLMoE model)OpenAIGPTConfigconfiguration class:OpenAIGPTLMHeadModel(OpenAI GPT model)OpenLlamaConfigconfiguration class:OpenLlamaForCausalLM(OpenLlama model)PLBartConfigconfiguration class:PLBartForCausalLM(PLBart model)PegasusConfigconfiguration class:PegasusForCausalLM(Pegasus model)PersimmonConfigconfiguration class:PersimmonForCausalLM(Persimmon model)Phi3Configconfiguration class:Phi3ForCausalLM(Phi3 model)Phi4MultimodalConfigconfiguration class:Phi4MultimodalForCausalLM(Phi4Multimodal model)PhiConfigconfiguration class:PhiForCausalLM(Phi model)PhimoeConfigconfiguration class:PhimoeForCausalLM(Phimoe model)ProphetNetConfigconfiguration class:ProphetNetForCausalLM(ProphetNet model)QDQBertConfigconfiguration class:QDQBertLMHeadModel(QDQBert model)Qwen2Configconfiguration class:Qwen2ForCausalLM(Qwen2 model)Qwen2MoeConfigconfiguration class:Qwen2MoeForCausalLM(Qwen2MoE model)Qwen3Configconfiguration class:Qwen3ForCausalLM(Qwen3 model)Qwen3MoeConfigconfiguration class:Qwen3MoeForCausalLM(Qwen3MoE model)Qwen3NextConfigconfiguration class:Qwen3NextForCausalLM(Qwen3Next model)RecurrentGemmaConfigconfiguration class:RecurrentGemmaForCausalLM(RecurrentGemma model)ReformerConfigconfiguration class:ReformerModelWithLMHead(Reformer model)RemBertConfigconfiguration class:RemBertForCausalLM(RemBERT model)RoCBertConfigconfiguration class:RoCBertForCausalLM(RoCBert model)RoFormerConfigconfiguration class:RoFormerForCausalLM(RoFormer model)RobertaConfigconfiguration class:RobertaForCausalLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormForCausalLM(RoBERTa-PreLayerNorm model)RwkvConfigconfiguration class:RwkvForCausalLM(RWKV model)SeedOssConfigconfiguration class:SeedOssForCausalLM(SeedOss model)SmolLM3Configconfiguration class:SmolLM3ForCausalLM(SmolLM3 model)Speech2Text2Configconfiguration class:Speech2Text2ForCausalLM(Speech2Text2 model)StableLmConfigconfiguration class:StableLmForCausalLM(StableLm model)Starcoder2Configconfiguration class:Starcoder2ForCausalLM(Starcoder2 model)TrOCRConfigconfiguration class:TrOCRForCausalLM(TrOCR model)TransfoXLConfigconfiguration class:TransfoXLLMHeadModel(Transformer-XL model)VaultGemmaConfigconfiguration class:VaultGemmaForCausalLM(VaultGemma model)WhisperConfigconfiguration class:WhisperForCausalLM(Whisper model)XGLMConfigconfiguration class:XGLMForCausalLM(XGLM model)XLMConfigconfiguration class:XLMWithLMHeadModel(XLM model)XLMProphetNetConfigconfiguration class:XLMProphetNetForCausalLM(XLM-ProphetNet model)XLMRobertaConfigconfiguration class:XLMRobertaForCausalLM(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLForCausalLM(XLM-RoBERTa-XL model)XLNetConfigconfiguration class:XLNetLMHeadModel(XLNet model)XmodConfigconfiguration class:XmodForCausalLM(X-MOD model)Zamba2Configconfiguration class:Zamba2ForCausalLM(Zamba2 model)ZambaConfigconfiguration class:ZambaForCausalLM(Zamba model)xLSTMConfigconfiguration class:xLSTMForCausalLM(xLSTM model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- apertus —
ApertusForCausalLM(Apertus model) - arcee —
ArceeForCausalLM(Arcee model) - aria_text —
AriaTextForCausalLM(AriaText model) - bamba —
BambaForCausalLM(Bamba model) - bart — BartForCausalLM (BART model)
- bert — BertLMHeadModel (BERT model)
- bert-generation — BertGenerationDecoder (Bert Generation model)
- big_bird — BigBirdForCausalLM (BigBird model)
- bigbird_pegasus — BigBirdPegasusForCausalLM (BigBird-Pegasus model)
- biogpt — BioGptForCausalLM (BioGpt model)
- bitnet —
BitNetForCausalLM(BitNet model) - blenderbot — BlenderbotForCausalLM (Blenderbot model)
- blenderbot-small — BlenderbotSmallForCausalLM (BlenderbotSmall model)
- bloom — BloomForCausalLM (BLOOM model)
- blt —
BltForCausalLM(Blt model) - camembert — CamembertForCausalLM (CamemBERT model)
- code_llama —
LlamaForCausalLM(CodeLlama model) - codegen — CodeGenForCausalLM (CodeGen model)
- cohere —
CohereForCausalLM(Cohere model) - cohere2 —
Cohere2ForCausalLM(Cohere2 model) - cpmant — CpmAntForCausalLM (CPM-Ant model)
- ctrl — CTRLLMHeadModel (CTRL model)
- data2vec-text — Data2VecTextForCausalLM (Data2VecText model)
- dbrx —
DbrxForCausalLM(DBRX model) - deepseek_v2 —
DeepseekV2ForCausalLM(DeepSeek-V2 model) - deepseek_v3 —
DeepseekV3ForCausalLM(DeepSeek-V3 model) - diffllama —
DiffLlamaForCausalLM(DiffLlama model) - doge —
DogeForCausalLM(Doge model) - dots1 —
Dots1ForCausalLM(dots1 model) - electra —
ElectraForCausalLM(ELECTRA model) - emu3 —
Emu3ForCausalLM(Emu3 model) - ernie —
ErnieForCausalLM(ERNIE model) - ernie4_5 —
Ernie4_5ForCausalLM(Ernie4_5 model) - ernie4_5_moe —
Ernie4_5_MoeForCausalLM(Ernie4_5_MoE model) - exaone4 —
Exaone4ForCausalLM(EXAONE-4.0 model) - falcon —
FalconForCausalLM(Falcon model) - falcon_h1 —
FalconH1ForCausalLM(FalconH1 model) - falcon_mamba —
FalconMambaForCausalLM(FalconMamba model) - flex_olmo —
FlexOlmoForCausalLM(FlexOlmo model) - fuyu —
FuyuForCausalLM(Fuyu model) - gemma —
GemmaForCausalLM(Gemma model) - gemma2 —
Gemma2ForCausalLM(Gemma2 model) - gemma3 —
Gemma3ForConditionalGeneration(Gemma3ForConditionalGeneration model) - gemma3_text —
Gemma3ForCausalLM(Gemma3ForCausalLM model) - gemma3n —
Gemma3nForConditionalGeneration(Gemma3nForConditionalGeneration model) - gemma3n_text —
Gemma3nForCausalLM(Gemma3nForCausalLM model) - git —
GitForCausalLM(GIT model) - glm —
GlmForCausalLM(GLM model) - glm4 —
Glm4ForCausalLM(GLM4 model) - glm4_moe —
Glm4MoeForCausalLM(Glm4MoE model) - got_ocr2 —
GotOcr2ForConditionalGeneration(GOT-OCR2 model) - gpt-sw3 —
GPT2LMHeadModel(GPT-Sw3 model) - gpt2 —
GPT2LMHeadModel(OpenAI GPT-2 model) - gpt_bigcode —
GPTBigCodeForCausalLM(GPTBigCode model) - gpt_neo —
GPTNeoForCausalLM(GPT Neo model) - gpt_neox —
GPTNeoXForCausalLM(GPT NeoX model) - gpt_neox_japanese —
GPTNeoXJapaneseForCausalLM(GPT NeoX Japanese model) - gpt_oss —
GptOssForCausalLM(GptOss model) - gptj —
GPTJForCausalLM(GPT-J model) - granite —
GraniteForCausalLM(Granite model) - granitemoe —
GraniteMoeForCausalLM(GraniteMoeMoe model) - granitemoehybrid —
GraniteMoeHybridForCausalLM(GraniteMoeHybrid model) - granitemoeshared —
GraniteMoeSharedForCausalLM(GraniteMoeSharedMoe model) - helium —
HeliumForCausalLM(Helium model) - hunyuan_v1_dense —
HunYuanDenseV1ForCausalLM(HunYuanDenseV1 model) - hunyuan_v1_moe —
HunYuanMoEV1ForCausalLM(HunYuanMoeV1 model) - jamba —
JambaForCausalLM(Jamba model) - jetmoe —
JetMoeForCausalLM(JetMoe model) - lfm2 —
Lfm2ForCausalLM(Lfm2 model) - llama —
LlamaForCausalLM(LLaMA model) - llama4 —
Llama4ForCausalLM(Llama4 model) - llama4_text —
Llama4ForCausalLM(Llama4ForCausalLM model) - longcat_flash —
LongcatFlashForCausalLM(LongCatFlash model) - mamba —
MambaForCausalLM(Mamba model) - mamba2 —
Mamba2ForCausalLM(mamba2 model) - marian —
MarianForCausalLM(Marian model) - mbart —
MBartForCausalLM(mBART model) - mega —
MegaForCausalLM(MEGA model) - megatron-bert —
MegatronBertForCausalLM(Megatron-BERT model) - minimax —
MiniMaxForCausalLM(MiniMax model) - ministral —
MinistralForCausalLM(Ministral model) - mistral —
MistralForCausalLM(Mistral model) - mixtral —
MixtralForCausalLM(Mixtral model) - mllama —
MllamaForCausalLM(Mllama model) - modernbert-decoder —
ModernBertDecoderForCausalLM(ModernBertDecoder model) - moshi —
MoshiForCausalLM(Moshi model) - mpt —
MptForCausalLM(MPT model) - musicgen —
MusicgenForCausalLM(MusicGen model) - musicgen_melody —
MusicgenMelodyForCausalLM(MusicGen Melody model) - mvp —
MvpForCausalLM(MVP model) - nemotron —
NemotronForCausalLM(Nemotron model) - olmo —
OlmoForCausalLM(OLMo model) - olmo2 —
Olmo2ForCausalLM(OLMo2 model) - olmo3 —
Olmo3ForCausalLM(Olmo3 model) - olmoe —
OlmoeForCausalLM(OLMoE model) - open-llama —
OpenLlamaForCausalLM(OpenLlama model) - openai-gpt —
OpenAIGPTLMHeadModel(OpenAI GPT model) - opt —
OPTForCausalLM(OPT model) - pegasus —
PegasusForCausalLM(Pegasus model) - persimmon —
PersimmonForCausalLM(Persimmon model) - phi —
PhiForCausalLM(Phi model) - phi3 —
Phi3ForCausalLM(Phi3 model) - phi4_multimodal —
Phi4MultimodalForCausalLM(Phi4Multimodal model) - phimoe —
PhimoeForCausalLM(Phimoe model) - plbart —
PLBartForCausalLM(PLBart model) - prophetnet —
ProphetNetForCausalLM(ProphetNet model) - qdqbert —
QDQBertLMHeadModel(QDQBert model) - qwen2 —
Qwen2ForCausalLM(Qwen2 model) - qwen2_moe —
Qwen2MoeForCausalLM(Qwen2MoE model) - qwen3 —
Qwen3ForCausalLM(Qwen3 model) - qwen3_moe —
Qwen3MoeForCausalLM(Qwen3MoE model) - qwen3_next —
Qwen3NextForCausalLM(Qwen3Next model) - recurrent_gemma —
RecurrentGemmaForCausalLM(RecurrentGemma model) - reformer —
ReformerModelWithLMHead(Reformer model) - rembert —
RemBertForCausalLM(RemBERT model) - roberta —
RobertaForCausalLM(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormForCausalLM(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertForCausalLM(RoCBert model) - roformer —
RoFormerForCausalLM(RoFormer model) - rwkv —
RwkvForCausalLM(RWKV model) - seed_oss —
SeedOssForCausalLM(SeedOss model) - smollm3 —
SmolLM3ForCausalLM(SmolLM3 model) - speech_to_text_2 —
Speech2Text2ForCausalLM(Speech2Text2 model) - stablelm —
StableLmForCausalLM(StableLm model) - starcoder2 —
Starcoder2ForCausalLM(Starcoder2 model) - transfo-xl —
TransfoXLLMHeadModel(Transformer-XL model) - trocr —
TrOCRForCausalLM(TrOCR model) - vaultgemma —
VaultGemmaForCausalLM(VaultGemma model) - whisper —
WhisperForCausalLM(Whisper model) - xglm —
XGLMForCausalLM(XGLM model) - xlm —
XLMWithLMHeadModel(XLM model) - xlm-prophetnet —
XLMProphetNetForCausalLM(XLM-ProphetNet model) - xlm-roberta —
XLMRobertaForCausalLM(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLForCausalLM(XLM-RoBERTa-XL model) - xlnet —
XLNetLMHeadModel(XLNet model) - xlstm —
xLSTMForCausalLM(xLSTM model) - xmod —
XmodForCausalLM(X-MOD model) - zamba —
ZambaForCausalLM(Zamba model) - zamba2 —
Zamba2ForCausalLM(Zamba2 model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForCausalLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForCausalLM.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForCausalLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BertConfig configuration class: TFBertLMHeadModel (BERT model)
- CTRLConfig configuration class: TFCTRLLMHeadModel (CTRL model)
- CamembertConfig configuration class: TFCamembertForCausalLM (CamemBERT model)
GPT2Configconfiguration class:TFGPT2LMHeadModel(OpenAI GPT-2 model)GPTJConfigconfiguration class:TFGPTJForCausalLM(GPT-J model)MistralConfigconfiguration class:TFMistralForCausalLM(Mistral model)OPTConfigconfiguration class:TFOPTForCausalLM(OPT model)OpenAIGPTConfigconfiguration class:TFOpenAIGPTLMHeadModel(OpenAI GPT model)RemBertConfigconfiguration class:TFRemBertForCausalLM(RemBERT model)RoFormerConfigconfiguration class:TFRoFormerForCausalLM(RoFormer model)RobertaConfigconfiguration class:TFRobertaForCausalLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormForCausalLM(RoBERTa-PreLayerNorm model)TransfoXLConfigconfiguration class:TFTransfoXLLMHeadModel(Transformer-XL model)XGLMConfigconfiguration class:TFXGLMForCausalLM(XGLM model)XLMConfigconfiguration class:TFXLMWithLMHeadModel(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaForCausalLM(XLM-RoBERTa model)XLNetConfigconfiguration class:TFXLNetLMHeadModel(XLNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bert — TFBertLMHeadModel (BERT model)
- camembert — TFCamembertForCausalLM (CamemBERT model)
- ctrl — TFCTRLLMHeadModel (CTRL model)
- gpt-sw3 —
TFGPT2LMHeadModel(GPT-Sw3 model) - gpt2 —
TFGPT2LMHeadModel(OpenAI GPT-2 model) - gptj —
TFGPTJForCausalLM(GPT-J model) - mistral —
TFMistralForCausalLM(Mistral model) - openai-gpt —
TFOpenAIGPTLMHeadModel(OpenAI GPT model) - opt —
TFOPTForCausalLM(OPT model) - rembert —
TFRemBertForCausalLM(RemBERT model) - roberta —
TFRobertaForCausalLM(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormForCausalLM(RoBERTa-PreLayerNorm model) - roformer —
TFRoFormerForCausalLM(RoFormer model) - transfo-xl —
TFTransfoXLLMHeadModel(Transformer-XL model) - xglm —
TFXGLMForCausalLM(XGLM model) - xlm —
TFXLMWithLMHeadModel(XLM model) - xlm-roberta —
TFXLMRobertaForCausalLM(XLM-RoBERTa model) - xlnet —
TFXLNetLMHeadModel(XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForCausalLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForCausalLM.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForCausalLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BartConfig configuration class: FlaxBartForCausalLM (BART model)
- BertConfig configuration class: FlaxBertForCausalLM (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForCausalLM (BigBird model)
- BloomConfig configuration class: FlaxBloomForCausalLM (BLOOM model)
ElectraConfigconfiguration class:FlaxElectraForCausalLM(ELECTRA model)GPT2Configconfiguration class:FlaxGPT2LMHeadModel(OpenAI GPT-2 model)GPTJConfigconfiguration class:FlaxGPTJForCausalLM(GPT-J model)GPTNeoConfigconfiguration class:FlaxGPTNeoForCausalLM(GPT Neo model)GemmaConfigconfiguration class:FlaxGemmaForCausalLM(Gemma model)LlamaConfigconfiguration class:FlaxLlamaForCausalLM(LLaMA model)MistralConfigconfiguration class:FlaxMistralForCausalLM(Mistral model)OPTConfigconfiguration class:FlaxOPTForCausalLM(OPT model)RobertaConfigconfiguration class:FlaxRobertaForCausalLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormForCausalLM(RoBERTa-PreLayerNorm model)XGLMConfigconfiguration class:FlaxXGLMForCausalLM(XGLM model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaForCausalLM(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bart — FlaxBartForCausalLM (BART model)
- bert — FlaxBertForCausalLM (BERT model)
- big_bird — FlaxBigBirdForCausalLM (BigBird model)
- bloom — FlaxBloomForCausalLM (BLOOM model)
- electra —
FlaxElectraForCausalLM(ELECTRA model) - gemma —
FlaxGemmaForCausalLM(Gemma model) - gpt-sw3 —
FlaxGPT2LMHeadModel(GPT-Sw3 model) - gpt2 —
FlaxGPT2LMHeadModel(OpenAI GPT-2 model) - gpt_neo —
FlaxGPTNeoForCausalLM(GPT Neo model) - gptj —
FlaxGPTJForCausalLM(GPT-J model) - llama —
FlaxLlamaForCausalLM(LLaMA model) - mistral —
FlaxMistralForCausalLM(Mistral model) - opt —
FlaxOPTForCausalLM(OPT model) - roberta —
FlaxRobertaForCausalLM(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormForCausalLM(RoBERTa-PreLayerNorm model) - xglm —
FlaxXGLMForCausalLM(XGLM model) - xlm-roberta —
FlaxXLMRobertaForCausalLM(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForCausalLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForCausalLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForCausalLM.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForMaskedLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a masked language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: AlbertForMaskedLM (ALBERT model)
- BartConfig configuration class: BartForConditionalGeneration (BART model)
- BertConfig configuration class: BertForMaskedLM (BERT model)
- BigBirdConfig configuration class: BigBirdForMaskedLM (BigBird model)
- CamembertConfig configuration class: CamembertForMaskedLM (CamemBERT model)
- ConvBertConfig configuration class: ConvBertForMaskedLM (ConvBERT model)
- Data2VecTextConfig configuration class: Data2VecTextForMaskedLM (Data2VecText model)
- DebertaConfig configuration class: DebertaForMaskedLM (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2ForMaskedLM (DeBERTa-v2 model)
DistilBertConfigconfiguration class:DistilBertForMaskedLM(DistilBERT model)ElectraConfigconfiguration class:ElectraForMaskedLM(ELECTRA model)ErnieConfigconfiguration class:ErnieForMaskedLM(ERNIE model)EsmConfigconfiguration class:EsmForMaskedLM(ESM model)FNetConfigconfiguration class:FNetForMaskedLM(FNet model)FlaubertConfigconfiguration class:FlaubertWithLMHeadModel(FlauBERT model)FunnelConfigconfiguration class:FunnelForMaskedLM(Funnel Transformer model)IBertConfigconfiguration class:IBertForMaskedLM(I-BERT model)LayoutLMConfigconfiguration class:LayoutLMForMaskedLM(LayoutLM model)LongformerConfigconfiguration class:LongformerForMaskedLM(Longformer model)LukeConfigconfiguration class:LukeForMaskedLM(LUKE model)MBartConfigconfiguration class:MBartForConditionalGeneration(mBART model)MPNetConfigconfiguration class:MPNetForMaskedLM(MPNet model)MegaConfigconfiguration class:MegaForMaskedLM(MEGA model)MegatronBertConfigconfiguration class:MegatronBertForMaskedLM(Megatron-BERT model)MobileBertConfigconfiguration class:MobileBertForMaskedLM(MobileBERT model)ModernBertConfigconfiguration class:ModernBertForMaskedLM(ModernBERT model)MraConfigconfiguration class:MraForMaskedLM(MRA model)MvpConfigconfiguration class:MvpForConditionalGeneration(MVP model)NezhaConfigconfiguration class:NezhaForMaskedLM(Nezha model)NystromformerConfigconfiguration class:NystromformerForMaskedLM(Nyströmformer model)PerceiverConfigconfiguration class:PerceiverForMaskedLM(Perceiver model)QDQBertConfigconfiguration class:QDQBertForMaskedLM(QDQBert model)ReformerConfigconfiguration class:ReformerForMaskedLM(Reformer model)RemBertConfigconfiguration class:RemBertForMaskedLM(RemBERT model)RoCBertConfigconfiguration class:RoCBertForMaskedLM(RoCBert model)RoFormerConfigconfiguration class:RoFormerForMaskedLM(RoFormer model)RobertaConfigconfiguration class:RobertaForMaskedLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model)SqueezeBertConfigconfiguration class:SqueezeBertForMaskedLM(SqueezeBERT model)TapasConfigconfiguration class:TapasForMaskedLM(TAPAS model)Wav2Vec2Configconfiguration class:Wav2Vec2ForMaskedLM(Wav2Vec2 model)XLMConfigconfiguration class:XLMWithLMHeadModel(XLM model)XLMRobertaConfigconfiguration class:XLMRobertaForMaskedLM(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLForMaskedLM(XLM-RoBERTa-XL model)XmodConfigconfiguration class:XmodForMaskedLM(X-MOD model)YosoConfigconfiguration class:YosoForMaskedLM(YOSO model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a masked language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a masked language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — AlbertForMaskedLM (ALBERT model)
- bart — BartForConditionalGeneration (BART model)
- bert — BertForMaskedLM (BERT model)
- big_bird — BigBirdForMaskedLM (BigBird model)
- camembert — CamembertForMaskedLM (CamemBERT model)
- convbert — ConvBertForMaskedLM (ConvBERT model)
- data2vec-text — Data2VecTextForMaskedLM (Data2VecText model)
- deberta — DebertaForMaskedLM (DeBERTa model)
- deberta-v2 — DebertaV2ForMaskedLM (DeBERTa-v2 model)
- distilbert —
DistilBertForMaskedLM(DistilBERT model) - electra —
ElectraForMaskedLM(ELECTRA model) - ernie —
ErnieForMaskedLM(ERNIE model) - esm —
EsmForMaskedLM(ESM model) - flaubert —
FlaubertWithLMHeadModel(FlauBERT model) - fnet —
FNetForMaskedLM(FNet model) - funnel —
FunnelForMaskedLM(Funnel Transformer model) - ibert —
IBertForMaskedLM(I-BERT model) - layoutlm —
LayoutLMForMaskedLM(LayoutLM model) - longformer —
LongformerForMaskedLM(Longformer model) - luke —
LukeForMaskedLM(LUKE model) - mbart —
MBartForConditionalGeneration(mBART model) - mega —
MegaForMaskedLM(MEGA model) - megatron-bert —
MegatronBertForMaskedLM(Megatron-BERT model) - mobilebert —
MobileBertForMaskedLM(MobileBERT model) - modernbert —
ModernBertForMaskedLM(ModernBERT model) - mpnet —
MPNetForMaskedLM(MPNet model) - mra —
MraForMaskedLM(MRA model) - mvp —
MvpForConditionalGeneration(MVP model) - nezha —
NezhaForMaskedLM(Nezha model) - nystromformer —
NystromformerForMaskedLM(Nyströmformer model) - perceiver —
PerceiverForMaskedLM(Perceiver model) - qdqbert —
QDQBertForMaskedLM(QDQBert model) - reformer —
ReformerForMaskedLM(Reformer model) - rembert —
RemBertForMaskedLM(RemBERT model) - roberta —
RobertaForMaskedLM(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertForMaskedLM(RoCBert model) - roformer —
RoFormerForMaskedLM(RoFormer model) - squeezebert —
SqueezeBertForMaskedLM(SqueezeBERT model) - tapas —
TapasForMaskedLM(TAPAS model) - wav2vec2 —
Wav2Vec2ForMaskedLM(Wav2Vec2 model) - xlm —
XLMWithLMHeadModel(XLM model) - xlm-roberta —
XLMRobertaForMaskedLM(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLForMaskedLM(XLM-RoBERTa-XL model) - xmod —
XmodForMaskedLM(X-MOD model) - yoso —
YosoForMaskedLM(YOSO model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForMaskedLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForMaskedLM.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForMaskedLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a masked language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertForMaskedLM (ALBERT model)
- BertConfig configuration class: TFBertForMaskedLM (BERT model)
- CamembertConfig configuration class: TFCamembertForMaskedLM (CamemBERT model)
- ConvBertConfig configuration class: TFConvBertForMaskedLM (ConvBERT model)
- DebertaConfig configuration class: TFDebertaForMaskedLM (DeBERTa model)
- DebertaV2Config configuration class: TFDebertaV2ForMaskedLM (DeBERTa-v2 model)
DistilBertConfigconfiguration class:TFDistilBertForMaskedLM(DistilBERT model)ElectraConfigconfiguration class:TFElectraForMaskedLM(ELECTRA model)EsmConfigconfiguration class:TFEsmForMaskedLM(ESM model)FlaubertConfigconfiguration class:TFFlaubertWithLMHeadModel(FlauBERT model)FunnelConfigconfiguration class:TFFunnelForMaskedLM(Funnel Transformer model)LayoutLMConfigconfiguration class:TFLayoutLMForMaskedLM(LayoutLM model)LongformerConfigconfiguration class:TFLongformerForMaskedLM(Longformer model)MPNetConfigconfiguration class:TFMPNetForMaskedLM(MPNet model)MobileBertConfigconfiguration class:TFMobileBertForMaskedLM(MobileBERT model)RemBertConfigconfiguration class:TFRemBertForMaskedLM(RemBERT model)RoFormerConfigconfiguration class:TFRoFormerForMaskedLM(RoFormer model)RobertaConfigconfiguration class:TFRobertaForMaskedLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model)TapasConfigconfiguration class:TFTapasForMaskedLM(TAPAS model)XLMConfigconfiguration class:TFXLMWithLMHeadModel(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaForMaskedLM(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a masked language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a masked language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — TFAlbertForMaskedLM (ALBERT model)
- bert — TFBertForMaskedLM (BERT model)
- camembert — TFCamembertForMaskedLM (CamemBERT model)
- convbert — TFConvBertForMaskedLM (ConvBERT model)
- deberta — TFDebertaForMaskedLM (DeBERTa model)
- deberta-v2 — TFDebertaV2ForMaskedLM (DeBERTa-v2 model)
- distilbert —
TFDistilBertForMaskedLM(DistilBERT model) - electra —
TFElectraForMaskedLM(ELECTRA model) - esm —
TFEsmForMaskedLM(ESM model) - flaubert —
TFFlaubertWithLMHeadModel(FlauBERT model) - funnel —
TFFunnelForMaskedLM(Funnel Transformer model) - layoutlm —
TFLayoutLMForMaskedLM(LayoutLM model) - longformer —
TFLongformerForMaskedLM(Longformer model) - mobilebert —
TFMobileBertForMaskedLM(MobileBERT model) - mpnet —
TFMPNetForMaskedLM(MPNet model) - rembert —
TFRemBertForMaskedLM(RemBERT model) - roberta —
TFRobertaForMaskedLM(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model) - roformer —
TFRoFormerForMaskedLM(RoFormer model) - tapas —
TFTapasForMaskedLM(TAPAS model) - xlm —
TFXLMWithLMHeadModel(XLM model) - xlm-roberta —
TFXLMRobertaForMaskedLM(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForMaskedLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForMaskedLM.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForMaskedLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a masked language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertForMaskedLM (ALBERT model)
- BartConfig configuration class: FlaxBartForConditionalGeneration (BART model)
- BertConfig configuration class: FlaxBertForMaskedLM (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForMaskedLM (BigBird model)
DistilBertConfigconfiguration class:FlaxDistilBertForMaskedLM(DistilBERT model)ElectraConfigconfiguration class:FlaxElectraForMaskedLM(ELECTRA model)MBartConfigconfiguration class:FlaxMBartForConditionalGeneration(mBART model)RoFormerConfigconfiguration class:FlaxRoFormerForMaskedLM(RoFormer model)RobertaConfigconfiguration class:FlaxRobertaForMaskedLM(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaForMaskedLM(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a masked language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a masked language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — FlaxAlbertForMaskedLM (ALBERT model)
- bart — FlaxBartForConditionalGeneration (BART model)
- bert — FlaxBertForMaskedLM (BERT model)
- big_bird — FlaxBigBirdForMaskedLM (BigBird model)
- distilbert —
FlaxDistilBertForMaskedLM(DistilBERT model) - electra —
FlaxElectraForMaskedLM(ELECTRA model) - mbart —
FlaxMBartForConditionalGeneration(mBART model) - roberta —
FlaxRobertaForMaskedLM(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormForMaskedLM(RoBERTa-PreLayerNorm model) - roformer —
FlaxRoFormerForMaskedLM(RoFormer model) - xlm-roberta —
FlaxXLMRobertaForMaskedLM(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForMaskedLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForMaskedLM.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForMaskGeneration
TFAutoModelForMaskGeneration
AutoModelForSeq2SeqLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BartConfig configuration class: BartForConditionalGeneration (BART model)
- BigBirdPegasusConfig configuration class: BigBirdPegasusForConditionalGeneration (BigBird-Pegasus model)
- BlenderbotConfig configuration class: BlenderbotForConditionalGeneration (Blenderbot model)
- BlenderbotSmallConfig configuration class: BlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
EncoderDecoderConfigconfiguration class:EncoderDecoderModel(Encoder decoder model)FSMTConfigconfiguration class:FSMTForConditionalGeneration(FairSeq Machine-Translation model)GPTSanJapaneseConfigconfiguration class:GPTSanJapaneseForConditionalGeneration(GPTSAN-japanese model)GraniteSpeechConfigconfiguration class:GraniteSpeechForConditionalGeneration(GraniteSpeech model)LEDConfigconfiguration class:LEDForConditionalGeneration(LED model)LongT5Configconfiguration class:LongT5ForConditionalGeneration(LongT5 model)M2M100Configconfiguration class:M2M100ForConditionalGeneration(M2M100 model)MBartConfigconfiguration class:MBartForConditionalGeneration(mBART model)MT5Configconfiguration class:MT5ForConditionalGeneration(MT5 model)MarianConfigconfiguration class:MarianMTModel(Marian model)MvpConfigconfiguration class:MvpForConditionalGeneration(MVP model)NllbMoeConfigconfiguration class:NllbMoeForConditionalGeneration(NLLB-MOE model)PLBartConfigconfiguration class:PLBartForConditionalGeneration(PLBart model)PegasusConfigconfiguration class:PegasusForConditionalGeneration(Pegasus model)PegasusXConfigconfiguration class:PegasusXForConditionalGeneration(PEGASUS-X model)ProphetNetConfigconfiguration class:ProphetNetForConditionalGeneration(ProphetNet model)Qwen2AudioConfigconfiguration class:Qwen2AudioForConditionalGeneration(Qwen2Audio model)SeamlessM4TConfigconfiguration class:SeamlessM4TForTextToText(SeamlessM4T model)SeamlessM4Tv2Configconfiguration class:SeamlessM4Tv2ForTextToText(SeamlessM4Tv2 model)SwitchTransformersConfigconfiguration class:SwitchTransformersForConditionalGeneration(SwitchTransformers model)T5Configconfiguration class:T5ForConditionalGeneration(T5 model)T5GemmaConfigconfiguration class:T5GemmaForConditionalGeneration(T5Gemma model)UMT5Configconfiguration class:UMT5ForConditionalGeneration(UMT5 model)VoxtralConfigconfiguration class:VoxtralForConditionalGeneration(Voxtral model)XLMProphetNetConfigconfiguration class:XLMProphetNetForConditionalGeneration(XLM-ProphetNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence-to-sequence language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence-to-sequence language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bart — BartForConditionalGeneration (BART model)
- bigbird_pegasus — BigBirdPegasusForConditionalGeneration (BigBird-Pegasus model)
- blenderbot — BlenderbotForConditionalGeneration (Blenderbot model)
- blenderbot-small — BlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
- encoder-decoder —
EncoderDecoderModel(Encoder decoder model) - fsmt —
FSMTForConditionalGeneration(FairSeq Machine-Translation model) - gptsan-japanese —
GPTSanJapaneseForConditionalGeneration(GPTSAN-japanese model) - granite_speech —
GraniteSpeechForConditionalGeneration(GraniteSpeech model) - led —
LEDForConditionalGeneration(LED model) - longt5 —
LongT5ForConditionalGeneration(LongT5 model) - m2m_100 —
M2M100ForConditionalGeneration(M2M100 model) - marian —
MarianMTModel(Marian model) - mbart —
MBartForConditionalGeneration(mBART model) - mt5 —
MT5ForConditionalGeneration(MT5 model) - mvp —
MvpForConditionalGeneration(MVP model) - nllb-moe —
NllbMoeForConditionalGeneration(NLLB-MOE model) - pegasus —
PegasusForConditionalGeneration(Pegasus model) - pegasus_x —
PegasusXForConditionalGeneration(PEGASUS-X model) - plbart —
PLBartForConditionalGeneration(PLBart model) - prophetnet —
ProphetNetForConditionalGeneration(ProphetNet model) - qwen2_audio —
Qwen2AudioForConditionalGeneration(Qwen2Audio model) - seamless_m4t —
SeamlessM4TForTextToText(SeamlessM4T model) - seamless_m4t_v2 —
SeamlessM4Tv2ForTextToText(SeamlessM4Tv2 model) - switch_transformers —
SwitchTransformersForConditionalGeneration(SwitchTransformers model) - t5 —
T5ForConditionalGeneration(T5 model) - t5gemma —
T5GemmaForConditionalGeneration(T5Gemma model) - umt5 —
UMT5ForConditionalGeneration(UMT5 model) - voxtral —
VoxtralForConditionalGeneration(Voxtral model) - xlm-prophetnet —
XLMProphetNetForConditionalGeneration(XLM-ProphetNet model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForSeq2SeqLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> # Update configuration during loading
>>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/t5_tf_model_config.json")
>>> model = AutoModelForSeq2SeqLM.from_pretrained(
... "./tf_model/t5_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForSeq2SeqLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BartConfig configuration class: TFBartForConditionalGeneration (BART model)
- BlenderbotConfig configuration class: TFBlenderbotForConditionalGeneration (Blenderbot model)
- BlenderbotSmallConfig configuration class: TFBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
EncoderDecoderConfigconfiguration class:TFEncoderDecoderModel(Encoder decoder model)LEDConfigconfiguration class:TFLEDForConditionalGeneration(LED model)MBartConfigconfiguration class:TFMBartForConditionalGeneration(mBART model)MT5Configconfiguration class:TFMT5ForConditionalGeneration(MT5 model)MarianConfigconfiguration class:TFMarianMTModel(Marian model)PegasusConfigconfiguration class:TFPegasusForConditionalGeneration(Pegasus model)T5Configconfiguration class:TFT5ForConditionalGeneration(T5 model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence-to-sequence language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence-to-sequence language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bart — TFBartForConditionalGeneration (BART model)
- blenderbot — TFBlenderbotForConditionalGeneration (Blenderbot model)
- blenderbot-small — TFBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
- encoder-decoder —
TFEncoderDecoderModel(Encoder decoder model) - led —
TFLEDForConditionalGeneration(LED model) - marian —
TFMarianMTModel(Marian model) - mbart —
TFMBartForConditionalGeneration(mBART model) - mt5 —
TFMT5ForConditionalGeneration(MT5 model) - pegasus —
TFPegasusForConditionalGeneration(Pegasus model) - t5 —
TFT5ForConditionalGeneration(T5 model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForSeq2SeqLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> # Update configuration during loading
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/t5_pt_model_config.json")
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(
... "./pt_model/t5_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForSeq2SeqLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BartConfig configuration class: FlaxBartForConditionalGeneration (BART model)
- BlenderbotConfig configuration class: FlaxBlenderbotForConditionalGeneration (Blenderbot model)
- BlenderbotSmallConfig configuration class: FlaxBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
EncoderDecoderConfigconfiguration class:FlaxEncoderDecoderModel(Encoder decoder model)LongT5Configconfiguration class:FlaxLongT5ForConditionalGeneration(LongT5 model)MBartConfigconfiguration class:FlaxMBartForConditionalGeneration(mBART model)MT5Configconfiguration class:FlaxMT5ForConditionalGeneration(MT5 model)MarianConfigconfiguration class:FlaxMarianMTModel(Marian model)PegasusConfigconfiguration class:FlaxPegasusForConditionalGeneration(Pegasus model)T5Configconfiguration class:FlaxT5ForConditionalGeneration(T5 model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence-to-sequence language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence-to-sequence language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bart — FlaxBartForConditionalGeneration (BART model)
- blenderbot — FlaxBlenderbotForConditionalGeneration (Blenderbot model)
- blenderbot-small — FlaxBlenderbotSmallForConditionalGeneration (BlenderbotSmall model)
- encoder-decoder —
FlaxEncoderDecoderModel(Encoder decoder model) - longt5 —
FlaxLongT5ForConditionalGeneration(LongT5 model) - marian —
FlaxMarianMTModel(Marian model) - mbart —
FlaxMBartForConditionalGeneration(mBART model) - mt5 —
FlaxMT5ForConditionalGeneration(MT5 model) - pegasus —
FlaxPegasusForConditionalGeneration(Pegasus model) - t5 —
FlaxT5ForConditionalGeneration(T5 model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForSeq2SeqLM
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/t5_pt_model_config.json")
>>> model = FlaxAutoModelForSeq2SeqLM.from_pretrained(
... "./pt_model/t5_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForSequenceClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: AlbertForSequenceClassification (ALBERT model)
ArceeConfigconfiguration class:ArceeForSequenceClassification(Arcee model)- BartConfig configuration class: BartForSequenceClassification (BART model)
- BertConfig configuration class: BertForSequenceClassification (BERT model)
- BigBirdConfig configuration class: BigBirdForSequenceClassification (BigBird model)
- BigBirdPegasusConfig configuration class: BigBirdPegasusForSequenceClassification (BigBird-Pegasus model)
- BioGptConfig configuration class: BioGptForSequenceClassification (BioGpt model)
- BloomConfig configuration class: BloomForSequenceClassification (BLOOM model)
- CTRLConfig configuration class: CTRLForSequenceClassification (CTRL model)
- CamembertConfig configuration class: CamembertForSequenceClassification (CamemBERT model)
- CanineConfig configuration class: CanineForSequenceClassification (CANINE model)
- ConvBertConfig configuration class: ConvBertForSequenceClassification (ConvBERT model)
- Data2VecTextConfig configuration class: Data2VecTextForSequenceClassification (Data2VecText model)
- DebertaConfig configuration class: DebertaForSequenceClassification (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2ForSequenceClassification (DeBERTa-v2 model)
DeepseekV2Configconfiguration class:DeepseekV2ForSequenceClassification(DeepSeek-V2 model)DeepseekV3Configconfiguration class:DeepseekV3ForSequenceClassification(DeepSeek-V3 model)DiffLlamaConfigconfiguration class:DiffLlamaForSequenceClassification(DiffLlama model)DistilBertConfigconfiguration class:DistilBertForSequenceClassification(DistilBERT model)DogeConfigconfiguration class:DogeForSequenceClassification(Doge model)ElectraConfigconfiguration class:ElectraForSequenceClassification(ELECTRA model)ErnieConfigconfiguration class:ErnieForSequenceClassification(ERNIE model)ErnieMConfigconfiguration class:ErnieMForSequenceClassification(ErnieM model)EsmConfigconfiguration class:EsmForSequenceClassification(ESM model)Exaone4Configconfiguration class:Exaone4ForSequenceClassification(EXAONE-4.0 model)FNetConfigconfiguration class:FNetForSequenceClassification(FNet model)FalconConfigconfiguration class:FalconForSequenceClassification(Falcon model)FlaubertConfigconfiguration class:FlaubertForSequenceClassification(FlauBERT model)FunnelConfigconfiguration class:FunnelForSequenceClassification(Funnel Transformer model)GPT2Configconfiguration class:GPT2ForSequenceClassification(OpenAI GPT-2 model)GPTBigCodeConfigconfiguration class:GPTBigCodeForSequenceClassification(GPTBigCode model)GPTJConfigconfiguration class:GPTJForSequenceClassification(GPT-J model)GPTNeoConfigconfiguration class:GPTNeoForSequenceClassification(GPT Neo model)GPTNeoXConfigconfiguration class:GPTNeoXForSequenceClassification(GPT NeoX model)Gemma2Configconfiguration class:Gemma2ForSequenceClassification(Gemma2 model)Gemma3Configconfiguration class:Gemma3ForSequenceClassification(Gemma3ForConditionalGeneration model)Gemma3TextConfigconfiguration class:Gemma3TextForSequenceClassification(Gemma3ForCausalLM model)GemmaConfigconfiguration class:GemmaForSequenceClassification(Gemma model)Glm4Configconfiguration class:Glm4ForSequenceClassification(GLM4 model)GlmConfigconfiguration class:GlmForSequenceClassification(GLM model)GptOssConfigconfiguration class:GptOssForSequenceClassification(GptOss model)HeliumConfigconfiguration class:HeliumForSequenceClassification(Helium model)HunYuanDenseV1Configconfiguration class:HunYuanDenseV1ForSequenceClassification(HunYuanDenseV1 model)HunYuanMoEV1Configconfiguration class:HunYuanMoEV1ForSequenceClassification(HunYuanMoeV1 model)IBertConfigconfiguration class:IBertForSequenceClassification(I-BERT model)JambaConfigconfiguration class:JambaForSequenceClassification(Jamba model)JetMoeConfigconfiguration class:JetMoeForSequenceClassification(JetMoe model)LEDConfigconfiguration class:LEDForSequenceClassification(LED model)LayoutLMConfigconfiguration class:LayoutLMForSequenceClassification(LayoutLM model)LayoutLMv2Configconfiguration class:LayoutLMv2ForSequenceClassification(LayoutLMv2 model)LayoutLMv3Configconfiguration class:LayoutLMv3ForSequenceClassification(LayoutLMv3 model)LiltConfigconfiguration class:LiltForSequenceClassification(LiLT model)LlamaConfigconfiguration class:LlamaForSequenceClassification(LLaMA model)LongformerConfigconfiguration class:LongformerForSequenceClassification(Longformer model)LukeConfigconfiguration class:LukeForSequenceClassification(LUKE model)MBartConfigconfiguration class:MBartForSequenceClassification(mBART model)MPNetConfigconfiguration class:MPNetForSequenceClassification(MPNet model)MT5Configconfiguration class:MT5ForSequenceClassification(MT5 model)MarkupLMConfigconfiguration class:MarkupLMForSequenceClassification(MarkupLM model)MegaConfigconfiguration class:MegaForSequenceClassification(MEGA model)MegatronBertConfigconfiguration class:MegatronBertForSequenceClassification(Megatron-BERT model)MiniMaxConfigconfiguration class:MiniMaxForSequenceClassification(MiniMax model)MinistralConfigconfiguration class:MinistralForSequenceClassification(Ministral model)MistralConfigconfiguration class:MistralForSequenceClassification(Mistral model)MixtralConfigconfiguration class:MixtralForSequenceClassification(Mixtral model)MobileBertConfigconfiguration class:MobileBertForSequenceClassification(MobileBERT model)ModernBertConfigconfiguration class:ModernBertForSequenceClassification(ModernBERT model)ModernBertDecoderConfigconfiguration class:ModernBertDecoderForSequenceClassification(ModernBertDecoder model)MptConfigconfiguration class:MptForSequenceClassification(MPT model)MraConfigconfiguration class:MraForSequenceClassification(MRA model)MvpConfigconfiguration class:MvpForSequenceClassification(MVP model)NemotronConfigconfiguration class:NemotronForSequenceClassification(Nemotron model)NezhaConfigconfiguration class:NezhaForSequenceClassification(Nezha model)NystromformerConfigconfiguration class:NystromformerForSequenceClassification(Nyströmformer model)OPTConfigconfiguration class:OPTForSequenceClassification(OPT model)OpenAIGPTConfigconfiguration class:OpenAIGPTForSequenceClassification(OpenAI GPT model)OpenLlamaConfigconfiguration class:OpenLlamaForSequenceClassification(OpenLlama model)PLBartConfigconfiguration class:PLBartForSequenceClassification(PLBart model)PerceiverConfigconfiguration class:PerceiverForSequenceClassification(Perceiver model)PersimmonConfigconfiguration class:PersimmonForSequenceClassification(Persimmon model)Phi3Configconfiguration class:Phi3ForSequenceClassification(Phi3 model)PhiConfigconfiguration class:PhiForSequenceClassification(Phi model)PhimoeConfigconfiguration class:PhimoeForSequenceClassification(Phimoe model)QDQBertConfigconfiguration class:QDQBertForSequenceClassification(QDQBert model)Qwen2Configconfiguration class:Qwen2ForSequenceClassification(Qwen2 model)Qwen2MoeConfigconfiguration class:Qwen2MoeForSequenceClassification(Qwen2MoE model)Qwen3Configconfiguration class:Qwen3ForSequenceClassification(Qwen3 model)Qwen3MoeConfigconfiguration class:Qwen3MoeForSequenceClassification(Qwen3MoE model)Qwen3NextConfigconfiguration class:Qwen3NextForSequenceClassification(Qwen3Next model)ReformerConfigconfiguration class:ReformerForSequenceClassification(Reformer model)RemBertConfigconfiguration class:RemBertForSequenceClassification(RemBERT model)RoCBertConfigconfiguration class:RoCBertForSequenceClassification(RoCBert model)RoFormerConfigconfiguration class:RoFormerForSequenceClassification(RoFormer model)RobertaConfigconfiguration class:RobertaForSequenceClassification(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormForSequenceClassification(RoBERTa-PreLayerNorm model)SeedOssConfigconfiguration class:SeedOssForSequenceClassification(SeedOss model)SmolLM3Configconfiguration class:SmolLM3ForSequenceClassification(SmolLM3 model)SqueezeBertConfigconfiguration class:SqueezeBertForSequenceClassification(SqueezeBERT model)StableLmConfigconfiguration class:StableLmForSequenceClassification(StableLm model)Starcoder2Configconfiguration class:Starcoder2ForSequenceClassification(Starcoder2 model)T5Configconfiguration class:T5ForSequenceClassification(T5 model)T5GemmaConfigconfiguration class:T5GemmaForSequenceClassification(T5Gemma model)TapasConfigconfiguration class:TapasForSequenceClassification(TAPAS model)TransfoXLConfigconfiguration class:TransfoXLForSequenceClassification(Transformer-XL model)UMT5Configconfiguration class:UMT5ForSequenceClassification(UMT5 model)XLMConfigconfiguration class:XLMForSequenceClassification(XLM model)XLMRobertaConfigconfiguration class:XLMRobertaForSequenceClassification(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLForSequenceClassification(XLM-RoBERTa-XL model)XLNetConfigconfiguration class:XLNetForSequenceClassification(XLNet model)XmodConfigconfiguration class:XmodForSequenceClassification(X-MOD model)YosoConfigconfiguration class:YosoForSequenceClassification(YOSO model)Zamba2Configconfiguration class:Zamba2ForSequenceClassification(Zamba2 model)ZambaConfigconfiguration class:ZambaForSequenceClassification(Zamba model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — AlbertForSequenceClassification (ALBERT model)
- arcee —
ArceeForSequenceClassification(Arcee model) - bart — BartForSequenceClassification (BART model)
- bert — BertForSequenceClassification (BERT model)
- big_bird — BigBirdForSequenceClassification (BigBird model)
- bigbird_pegasus — BigBirdPegasusForSequenceClassification (BigBird-Pegasus model)
- biogpt — BioGptForSequenceClassification (BioGpt model)
- bloom — BloomForSequenceClassification (BLOOM model)
- camembert — CamembertForSequenceClassification (CamemBERT model)
- canine — CanineForSequenceClassification (CANINE model)
- code_llama —
LlamaForSequenceClassification(CodeLlama model) - convbert — ConvBertForSequenceClassification (ConvBERT model)
- ctrl — CTRLForSequenceClassification (CTRL model)
- data2vec-text — Data2VecTextForSequenceClassification (Data2VecText model)
- deberta — DebertaForSequenceClassification (DeBERTa model)
- deberta-v2 — DebertaV2ForSequenceClassification (DeBERTa-v2 model)
- deepseek_v2 —
DeepseekV2ForSequenceClassification(DeepSeek-V2 model) - deepseek_v3 —
DeepseekV3ForSequenceClassification(DeepSeek-V3 model) - diffllama —
DiffLlamaForSequenceClassification(DiffLlama model) - distilbert —
DistilBertForSequenceClassification(DistilBERT model) - doge —
DogeForSequenceClassification(Doge model) - electra —
ElectraForSequenceClassification(ELECTRA model) - ernie —
ErnieForSequenceClassification(ERNIE model) - ernie_m —
ErnieMForSequenceClassification(ErnieM model) - esm —
EsmForSequenceClassification(ESM model) - exaone4 —
Exaone4ForSequenceClassification(EXAONE-4.0 model) - falcon —
FalconForSequenceClassification(Falcon model) - flaubert —
FlaubertForSequenceClassification(FlauBERT model) - fnet —
FNetForSequenceClassification(FNet model) - funnel —
FunnelForSequenceClassification(Funnel Transformer model) - gemma —
GemmaForSequenceClassification(Gemma model) - gemma2 —
Gemma2ForSequenceClassification(Gemma2 model) - gemma3 —
Gemma3ForSequenceClassification(Gemma3ForConditionalGeneration model) - gemma3_text —
Gemma3TextForSequenceClassification(Gemma3ForCausalLM model) - glm —
GlmForSequenceClassification(GLM model) - glm4 —
Glm4ForSequenceClassification(GLM4 model) - gpt-sw3 —
GPT2ForSequenceClassification(GPT-Sw3 model) - gpt2 —
GPT2ForSequenceClassification(OpenAI GPT-2 model) - gpt_bigcode —
GPTBigCodeForSequenceClassification(GPTBigCode model) - gpt_neo —
GPTNeoForSequenceClassification(GPT Neo model) - gpt_neox —
GPTNeoXForSequenceClassification(GPT NeoX model) - gpt_oss —
GptOssForSequenceClassification(GptOss model) - gptj —
GPTJForSequenceClassification(GPT-J model) - helium —
HeliumForSequenceClassification(Helium model) - hunyuan_v1_dense —
HunYuanDenseV1ForSequenceClassification(HunYuanDenseV1 model) - hunyuan_v1_moe —
HunYuanMoEV1ForSequenceClassification(HunYuanMoeV1 model) - ibert —
IBertForSequenceClassification(I-BERT model) - jamba —
JambaForSequenceClassification(Jamba model) - jetmoe —
JetMoeForSequenceClassification(JetMoe model) - layoutlm —
LayoutLMForSequenceClassification(LayoutLM model) - layoutlmv2 —
LayoutLMv2ForSequenceClassification(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3ForSequenceClassification(LayoutLMv3 model) - led —
LEDForSequenceClassification(LED model) - lilt —
LiltForSequenceClassification(LiLT model) - llama —
LlamaForSequenceClassification(LLaMA model) - longformer —
LongformerForSequenceClassification(Longformer model) - luke —
LukeForSequenceClassification(LUKE model) - markuplm —
MarkupLMForSequenceClassification(MarkupLM model) - mbart —
MBartForSequenceClassification(mBART model) - mega —
MegaForSequenceClassification(MEGA model) - megatron-bert —
MegatronBertForSequenceClassification(Megatron-BERT model) - minimax —
MiniMaxForSequenceClassification(MiniMax model) - ministral —
MinistralForSequenceClassification(Ministral model) - mistral —
MistralForSequenceClassification(Mistral model) - mixtral —
MixtralForSequenceClassification(Mixtral model) - mobilebert —
MobileBertForSequenceClassification(MobileBERT model) - modernbert —
ModernBertForSequenceClassification(ModernBERT model) - modernbert-decoder —
ModernBertDecoderForSequenceClassification(ModernBertDecoder model) - mpnet —
MPNetForSequenceClassification(MPNet model) - mpt —
MptForSequenceClassification(MPT model) - mra —
MraForSequenceClassification(MRA model) - mt5 —
MT5ForSequenceClassification(MT5 model) - mvp —
MvpForSequenceClassification(MVP model) - nemotron —
NemotronForSequenceClassification(Nemotron model) - nezha —
NezhaForSequenceClassification(Nezha model) - nystromformer —
NystromformerForSequenceClassification(Nyströmformer model) - open-llama —
OpenLlamaForSequenceClassification(OpenLlama model) - openai-gpt —
OpenAIGPTForSequenceClassification(OpenAI GPT model) - opt —
OPTForSequenceClassification(OPT model) - perceiver —
PerceiverForSequenceClassification(Perceiver model) - persimmon —
PersimmonForSequenceClassification(Persimmon model) - phi —
PhiForSequenceClassification(Phi model) - phi3 —
Phi3ForSequenceClassification(Phi3 model) - phimoe —
PhimoeForSequenceClassification(Phimoe model) - plbart —
PLBartForSequenceClassification(PLBart model) - qdqbert —
QDQBertForSequenceClassification(QDQBert model) - qwen2 —
Qwen2ForSequenceClassification(Qwen2 model) - qwen2_moe —
Qwen2MoeForSequenceClassification(Qwen2MoE model) - qwen3 —
Qwen3ForSequenceClassification(Qwen3 model) - qwen3_moe —
Qwen3MoeForSequenceClassification(Qwen3MoE model) - qwen3_next —
Qwen3NextForSequenceClassification(Qwen3Next model) - reformer —
ReformerForSequenceClassification(Reformer model) - rembert —
RemBertForSequenceClassification(RemBERT model) - roberta —
RobertaForSequenceClassification(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormForSequenceClassification(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertForSequenceClassification(RoCBert model) - roformer —
RoFormerForSequenceClassification(RoFormer model) - seed_oss —
SeedOssForSequenceClassification(SeedOss model) - smollm3 —
SmolLM3ForSequenceClassification(SmolLM3 model) - squeezebert —
SqueezeBertForSequenceClassification(SqueezeBERT model) - stablelm —
StableLmForSequenceClassification(StableLm model) - starcoder2 —
Starcoder2ForSequenceClassification(Starcoder2 model) - t5 —
T5ForSequenceClassification(T5 model) - t5gemma —
T5GemmaForSequenceClassification(T5Gemma model) - tapas —
TapasForSequenceClassification(TAPAS model) - transfo-xl —
TransfoXLForSequenceClassification(Transformer-XL model) - umt5 —
UMT5ForSequenceClassification(UMT5 model) - xlm —
XLMForSequenceClassification(XLM model) - xlm-roberta —
XLMRobertaForSequenceClassification(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLForSequenceClassification(XLM-RoBERTa-XL model) - xlnet —
XLNetForSequenceClassification(XLNet model) - xmod —
XmodForSequenceClassification(X-MOD model) - yoso —
YosoForSequenceClassification(YOSO model) - zamba —
ZambaForSequenceClassification(Zamba model) - zamba2 —
Zamba2ForSequenceClassification(Zamba2 model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForSequenceClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForSequenceClassification.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForSequenceClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertForSequenceClassification (ALBERT model)
- BartConfig configuration class: TFBartForSequenceClassification (BART model)
- BertConfig configuration class: TFBertForSequenceClassification (BERT model)
- CTRLConfig configuration class: TFCTRLForSequenceClassification (CTRL model)
- CamembertConfig configuration class: TFCamembertForSequenceClassification (CamemBERT model)
- ConvBertConfig configuration class: TFConvBertForSequenceClassification (ConvBERT model)
- DebertaConfig configuration class: TFDebertaForSequenceClassification (DeBERTa model)
- DebertaV2Config configuration class: TFDebertaV2ForSequenceClassification (DeBERTa-v2 model)
DistilBertConfigconfiguration class:TFDistilBertForSequenceClassification(DistilBERT model)ElectraConfigconfiguration class:TFElectraForSequenceClassification(ELECTRA model)EsmConfigconfiguration class:TFEsmForSequenceClassification(ESM model)FlaubertConfigconfiguration class:TFFlaubertForSequenceClassification(FlauBERT model)FunnelConfigconfiguration class:TFFunnelForSequenceClassification(Funnel Transformer model)GPT2Configconfiguration class:TFGPT2ForSequenceClassification(OpenAI GPT-2 model)GPTJConfigconfiguration class:TFGPTJForSequenceClassification(GPT-J model)LayoutLMConfigconfiguration class:TFLayoutLMForSequenceClassification(LayoutLM model)LayoutLMv3Configconfiguration class:TFLayoutLMv3ForSequenceClassification(LayoutLMv3 model)LongformerConfigconfiguration class:TFLongformerForSequenceClassification(Longformer model)MPNetConfigconfiguration class:TFMPNetForSequenceClassification(MPNet model)MistralConfigconfiguration class:TFMistralForSequenceClassification(Mistral model)MobileBertConfigconfiguration class:TFMobileBertForSequenceClassification(MobileBERT model)OpenAIGPTConfigconfiguration class:TFOpenAIGPTForSequenceClassification(OpenAI GPT model)RemBertConfigconfiguration class:TFRemBertForSequenceClassification(RemBERT model)RoFormerConfigconfiguration class:TFRoFormerForSequenceClassification(RoFormer model)RobertaConfigconfiguration class:TFRobertaForSequenceClassification(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormForSequenceClassification(RoBERTa-PreLayerNorm model)TapasConfigconfiguration class:TFTapasForSequenceClassification(TAPAS model)TransfoXLConfigconfiguration class:TFTransfoXLForSequenceClassification(Transformer-XL model)XLMConfigconfiguration class:TFXLMForSequenceClassification(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaForSequenceClassification(XLM-RoBERTa model)XLNetConfigconfiguration class:TFXLNetForSequenceClassification(XLNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — TFAlbertForSequenceClassification (ALBERT model)
- bart — TFBartForSequenceClassification (BART model)
- bert — TFBertForSequenceClassification (BERT model)
- camembert — TFCamembertForSequenceClassification (CamemBERT model)
- convbert — TFConvBertForSequenceClassification (ConvBERT model)
- ctrl — TFCTRLForSequenceClassification (CTRL model)
- deberta — TFDebertaForSequenceClassification (DeBERTa model)
- deberta-v2 — TFDebertaV2ForSequenceClassification (DeBERTa-v2 model)
- distilbert —
TFDistilBertForSequenceClassification(DistilBERT model) - electra —
TFElectraForSequenceClassification(ELECTRA model) - esm —
TFEsmForSequenceClassification(ESM model) - flaubert —
TFFlaubertForSequenceClassification(FlauBERT model) - funnel —
TFFunnelForSequenceClassification(Funnel Transformer model) - gpt-sw3 —
TFGPT2ForSequenceClassification(GPT-Sw3 model) - gpt2 —
TFGPT2ForSequenceClassification(OpenAI GPT-2 model) - gptj —
TFGPTJForSequenceClassification(GPT-J model) - layoutlm —
TFLayoutLMForSequenceClassification(LayoutLM model) - layoutlmv3 —
TFLayoutLMv3ForSequenceClassification(LayoutLMv3 model) - longformer —
TFLongformerForSequenceClassification(Longformer model) - mistral —
TFMistralForSequenceClassification(Mistral model) - mobilebert —
TFMobileBertForSequenceClassification(MobileBERT model) - mpnet —
TFMPNetForSequenceClassification(MPNet model) - openai-gpt —
TFOpenAIGPTForSequenceClassification(OpenAI GPT model) - rembert —
TFRemBertForSequenceClassification(RemBERT model) - roberta —
TFRobertaForSequenceClassification(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormForSequenceClassification(RoBERTa-PreLayerNorm model) - roformer —
TFRoFormerForSequenceClassification(RoFormer model) - tapas —
TFTapasForSequenceClassification(TAPAS model) - transfo-xl —
TFTransfoXLForSequenceClassification(Transformer-XL model) - xlm —
TFXLMForSequenceClassification(XLM model) - xlm-roberta —
TFXLMRobertaForSequenceClassification(XLM-RoBERTa model) - xlnet —
TFXLNetForSequenceClassification(XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForSequenceClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForSequenceClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForSequenceClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertForSequenceClassification (ALBERT model)
- BartConfig configuration class: FlaxBartForSequenceClassification (BART model)
- BertConfig configuration class: FlaxBertForSequenceClassification (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForSequenceClassification (BigBird model)
DistilBertConfigconfiguration class:FlaxDistilBertForSequenceClassification(DistilBERT model)ElectraConfigconfiguration class:FlaxElectraForSequenceClassification(ELECTRA model)MBartConfigconfiguration class:FlaxMBartForSequenceClassification(mBART model)RoFormerConfigconfiguration class:FlaxRoFormerForSequenceClassification(RoFormer model)RobertaConfigconfiguration class:FlaxRobertaForSequenceClassification(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormForSequenceClassification(RoBERTa-PreLayerNorm model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaForSequenceClassification(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — FlaxAlbertForSequenceClassification (ALBERT model)
- bart — FlaxBartForSequenceClassification (BART model)
- bert — FlaxBertForSequenceClassification (BERT model)
- big_bird — FlaxBigBirdForSequenceClassification (BigBird model)
- distilbert —
FlaxDistilBertForSequenceClassification(DistilBERT model) - electra —
FlaxElectraForSequenceClassification(ELECTRA model) - mbart —
FlaxMBartForSequenceClassification(mBART model) - roberta —
FlaxRobertaForSequenceClassification(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormForSequenceClassification(RoBERTa-PreLayerNorm model) - roformer —
FlaxRoFormerForSequenceClassification(RoFormer model) - xlm-roberta —
FlaxXLMRobertaForSequenceClassification(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForSequenceClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForSequenceClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForMultipleChoice
This is a generic model class that will be instantiated as one of the model classes of the library (with a multiple choice head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: AlbertForMultipleChoice (ALBERT model)
- BertConfig configuration class: BertForMultipleChoice (BERT model)
- BigBirdConfig configuration class: BigBirdForMultipleChoice (BigBird model)
- CamembertConfig configuration class: CamembertForMultipleChoice (CamemBERT model)
- CanineConfig configuration class: CanineForMultipleChoice (CANINE model)
- ConvBertConfig configuration class: ConvBertForMultipleChoice (ConvBERT model)
- Data2VecTextConfig configuration class: Data2VecTextForMultipleChoice (Data2VecText model)
- DebertaV2Config configuration class: DebertaV2ForMultipleChoice (DeBERTa-v2 model)
DistilBertConfigconfiguration class:DistilBertForMultipleChoice(DistilBERT model)ElectraConfigconfiguration class:ElectraForMultipleChoice(ELECTRA model)ErnieConfigconfiguration class:ErnieForMultipleChoice(ERNIE model)ErnieMConfigconfiguration class:ErnieMForMultipleChoice(ErnieM model)FNetConfigconfiguration class:FNetForMultipleChoice(FNet model)FlaubertConfigconfiguration class:FlaubertForMultipleChoice(FlauBERT model)FunnelConfigconfiguration class:FunnelForMultipleChoice(Funnel Transformer model)IBertConfigconfiguration class:IBertForMultipleChoice(I-BERT model)LongformerConfigconfiguration class:LongformerForMultipleChoice(Longformer model)LukeConfigconfiguration class:LukeForMultipleChoice(LUKE model)MPNetConfigconfiguration class:MPNetForMultipleChoice(MPNet model)MegaConfigconfiguration class:MegaForMultipleChoice(MEGA model)MegatronBertConfigconfiguration class:MegatronBertForMultipleChoice(Megatron-BERT model)MobileBertConfigconfiguration class:MobileBertForMultipleChoice(MobileBERT model)ModernBertConfigconfiguration class:ModernBertForMultipleChoice(ModernBERT model)MraConfigconfiguration class:MraForMultipleChoice(MRA model)NezhaConfigconfiguration class:NezhaForMultipleChoice(Nezha model)NystromformerConfigconfiguration class:NystromformerForMultipleChoice(Nyströmformer model)QDQBertConfigconfiguration class:QDQBertForMultipleChoice(QDQBert model)RemBertConfigconfiguration class:RemBertForMultipleChoice(RemBERT model)RoCBertConfigconfiguration class:RoCBertForMultipleChoice(RoCBert model)RoFormerConfigconfiguration class:RoFormerForMultipleChoice(RoFormer model)RobertaConfigconfiguration class:RobertaForMultipleChoice(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormForMultipleChoice(RoBERTa-PreLayerNorm model)SqueezeBertConfigconfiguration class:SqueezeBertForMultipleChoice(SqueezeBERT model)XLMConfigconfiguration class:XLMForMultipleChoice(XLM model)XLMRobertaConfigconfiguration class:XLMRobertaForMultipleChoice(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLForMultipleChoice(XLM-RoBERTa-XL model)XLNetConfigconfiguration class:XLNetForMultipleChoice(XLNet model)XmodConfigconfiguration class:XmodForMultipleChoice(X-MOD model)YosoConfigconfiguration class:YosoForMultipleChoice(YOSO model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a multiple choice head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a multiple choice head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — AlbertForMultipleChoice (ALBERT model)
- bert — BertForMultipleChoice (BERT model)
- big_bird — BigBirdForMultipleChoice (BigBird model)
- camembert — CamembertForMultipleChoice (CamemBERT model)
- canine — CanineForMultipleChoice (CANINE model)
- convbert — ConvBertForMultipleChoice (ConvBERT model)
- data2vec-text — Data2VecTextForMultipleChoice (Data2VecText model)
- deberta-v2 — DebertaV2ForMultipleChoice (DeBERTa-v2 model)
- distilbert —
DistilBertForMultipleChoice(DistilBERT model) - electra —
ElectraForMultipleChoice(ELECTRA model) - ernie —
ErnieForMultipleChoice(ERNIE model) - ernie_m —
ErnieMForMultipleChoice(ErnieM model) - flaubert —
FlaubertForMultipleChoice(FlauBERT model) - fnet —
FNetForMultipleChoice(FNet model) - funnel —
FunnelForMultipleChoice(Funnel Transformer model) - ibert —
IBertForMultipleChoice(I-BERT model) - longformer —
LongformerForMultipleChoice(Longformer model) - luke —
LukeForMultipleChoice(LUKE model) - mega —
MegaForMultipleChoice(MEGA model) - megatron-bert —
MegatronBertForMultipleChoice(Megatron-BERT model) - mobilebert —
MobileBertForMultipleChoice(MobileBERT model) - modernbert —
ModernBertForMultipleChoice(ModernBERT model) - mpnet —
MPNetForMultipleChoice(MPNet model) - mra —
MraForMultipleChoice(MRA model) - nezha —
NezhaForMultipleChoice(Nezha model) - nystromformer —
NystromformerForMultipleChoice(Nyströmformer model) - qdqbert —
QDQBertForMultipleChoice(QDQBert model) - rembert —
RemBertForMultipleChoice(RemBERT model) - roberta —
RobertaForMultipleChoice(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormForMultipleChoice(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertForMultipleChoice(RoCBert model) - roformer —
RoFormerForMultipleChoice(RoFormer model) - squeezebert —
SqueezeBertForMultipleChoice(SqueezeBERT model) - xlm —
XLMForMultipleChoice(XLM model) - xlm-roberta —
XLMRobertaForMultipleChoice(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLForMultipleChoice(XLM-RoBERTa-XL model) - xlnet —
XLNetForMultipleChoice(XLNet model) - xmod —
XmodForMultipleChoice(X-MOD model) - yoso —
YosoForMultipleChoice(YOSO model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForMultipleChoice
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForMultipleChoice.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForMultipleChoice
This is a generic model class that will be instantiated as one of the model classes of the library (with a multiple choice head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertForMultipleChoice (ALBERT model)
- BertConfig configuration class: TFBertForMultipleChoice (BERT model)
- CamembertConfig configuration class: TFCamembertForMultipleChoice (CamemBERT model)
- ConvBertConfig configuration class: TFConvBertForMultipleChoice (ConvBERT model)
- DebertaV2Config configuration class: TFDebertaV2ForMultipleChoice (DeBERTa-v2 model)
DistilBertConfigconfiguration class:TFDistilBertForMultipleChoice(DistilBERT model)ElectraConfigconfiguration class:TFElectraForMultipleChoice(ELECTRA model)FlaubertConfigconfiguration class:TFFlaubertForMultipleChoice(FlauBERT model)FunnelConfigconfiguration class:TFFunnelForMultipleChoice(Funnel Transformer model)LongformerConfigconfiguration class:TFLongformerForMultipleChoice(Longformer model)MPNetConfigconfiguration class:TFMPNetForMultipleChoice(MPNet model)MobileBertConfigconfiguration class:TFMobileBertForMultipleChoice(MobileBERT model)RemBertConfigconfiguration class:TFRemBertForMultipleChoice(RemBERT model)RoFormerConfigconfiguration class:TFRoFormerForMultipleChoice(RoFormer model)RobertaConfigconfiguration class:TFRobertaForMultipleChoice(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormForMultipleChoice(RoBERTa-PreLayerNorm model)XLMConfigconfiguration class:TFXLMForMultipleChoice(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaForMultipleChoice(XLM-RoBERTa model)XLNetConfigconfiguration class:TFXLNetForMultipleChoice(XLNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a multiple choice head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a multiple choice head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — TFAlbertForMultipleChoice (ALBERT model)
- bert — TFBertForMultipleChoice (BERT model)
- camembert — TFCamembertForMultipleChoice (CamemBERT model)
- convbert — TFConvBertForMultipleChoice (ConvBERT model)
- deberta-v2 — TFDebertaV2ForMultipleChoice (DeBERTa-v2 model)
- distilbert —
TFDistilBertForMultipleChoice(DistilBERT model) - electra —
TFElectraForMultipleChoice(ELECTRA model) - flaubert —
TFFlaubertForMultipleChoice(FlauBERT model) - funnel —
TFFunnelForMultipleChoice(Funnel Transformer model) - longformer —
TFLongformerForMultipleChoice(Longformer model) - mobilebert —
TFMobileBertForMultipleChoice(MobileBERT model) - mpnet —
TFMPNetForMultipleChoice(MPNet model) - rembert —
TFRemBertForMultipleChoice(RemBERT model) - roberta —
TFRobertaForMultipleChoice(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormForMultipleChoice(RoBERTa-PreLayerNorm model) - roformer —
TFRoFormerForMultipleChoice(RoFormer model) - xlm —
TFXLMForMultipleChoice(XLM model) - xlm-roberta —
TFXLMRobertaForMultipleChoice(XLM-RoBERTa model) - xlnet —
TFXLNetForMultipleChoice(XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForMultipleChoice
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForMultipleChoice.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForMultipleChoice
This is a generic model class that will be instantiated as one of the model classes of the library (with a multiple choice head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertForMultipleChoice (ALBERT model)
- BertConfig configuration class: FlaxBertForMultipleChoice (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForMultipleChoice (BigBird model)
DistilBertConfigconfiguration class:FlaxDistilBertForMultipleChoice(DistilBERT model)ElectraConfigconfiguration class:FlaxElectraForMultipleChoice(ELECTRA model)RoFormerConfigconfiguration class:FlaxRoFormerForMultipleChoice(RoFormer model)RobertaConfigconfiguration class:FlaxRobertaForMultipleChoice(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormForMultipleChoice(RoBERTa-PreLayerNorm model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaForMultipleChoice(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a multiple choice head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a multiple choice head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — FlaxAlbertForMultipleChoice (ALBERT model)
- bert — FlaxBertForMultipleChoice (BERT model)
- big_bird — FlaxBigBirdForMultipleChoice (BigBird model)
- distilbert —
FlaxDistilBertForMultipleChoice(DistilBERT model) - electra —
FlaxElectraForMultipleChoice(ELECTRA model) - roberta —
FlaxRobertaForMultipleChoice(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormForMultipleChoice(RoBERTa-PreLayerNorm model) - roformer —
FlaxRoFormerForMultipleChoice(RoFormer model) - xlm-roberta —
FlaxXLMRobertaForMultipleChoice(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForMultipleChoice
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForMultipleChoice.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForNextSentencePrediction
This is a generic model class that will be instantiated as one of the model classes of the library (with a next sentence prediction head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BertConfig configuration class: BertForNextSentencePrediction (BERT model)
ErnieConfigconfiguration class:ErnieForNextSentencePrediction(ERNIE model)FNetConfigconfiguration class:FNetForNextSentencePrediction(FNet model)MegatronBertConfigconfiguration class:MegatronBertForNextSentencePrediction(Megatron-BERT model)MobileBertConfigconfiguration class:MobileBertForNextSentencePrediction(MobileBERT model)NezhaConfigconfiguration class:NezhaForNextSentencePrediction(Nezha model)QDQBertConfigconfiguration class:QDQBertForNextSentencePrediction(QDQBert model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a next sentence prediction head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a next sentence prediction head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bert — BertForNextSentencePrediction (BERT model)
- ernie —
ErnieForNextSentencePrediction(ERNIE model) - fnet —
FNetForNextSentencePrediction(FNet model) - megatron-bert —
MegatronBertForNextSentencePrediction(Megatron-BERT model) - mobilebert —
MobileBertForNextSentencePrediction(MobileBERT model) - nezha —
NezhaForNextSentencePrediction(Nezha model) - qdqbert —
QDQBertForNextSentencePrediction(QDQBert model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForNextSentencePrediction
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForNextSentencePrediction.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForNextSentencePrediction
This is a generic model class that will be instantiated as one of the model classes of the library (with a next sentence prediction head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BertConfig configuration class: TFBertForNextSentencePrediction (BERT model)
MobileBertConfigconfiguration class:TFMobileBertForNextSentencePrediction(MobileBERT model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a next sentence prediction head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a next sentence prediction head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bert — TFBertForNextSentencePrediction (BERT model)
- mobilebert —
TFMobileBertForNextSentencePrediction(MobileBERT model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForNextSentencePrediction
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForNextSentencePrediction.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForNextSentencePrediction
This is a generic model class that will be instantiated as one of the model classes of the library (with a next sentence prediction head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BertConfig configuration class: FlaxBertForNextSentencePrediction (BERT model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a next sentence prediction head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a next sentence prediction head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- bert — FlaxBertForNextSentencePrediction (BERT model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForNextSentencePrediction
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForNextSentencePrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForNextSentencePrediction.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForTokenClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a token classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: AlbertForTokenClassification (ALBERT model)
ApertusConfigconfiguration class:ApertusForTokenClassification(Apertus model)ArceeConfigconfiguration class:ArceeForTokenClassification(Arcee model)- BertConfig configuration class: BertForTokenClassification (BERT model)
- BigBirdConfig configuration class: BigBirdForTokenClassification (BigBird model)
- BioGptConfig configuration class: BioGptForTokenClassification (BioGpt model)
- BloomConfig configuration class: BloomForTokenClassification (BLOOM model)
- BrosConfig configuration class: BrosForTokenClassification (BROS model)
- CamembertConfig configuration class: CamembertForTokenClassification (CamemBERT model)
- CanineConfig configuration class: CanineForTokenClassification (CANINE model)
- ConvBertConfig configuration class: ConvBertForTokenClassification (ConvBERT model)
- Data2VecTextConfig configuration class: Data2VecTextForTokenClassification (Data2VecText model)
- DebertaConfig configuration class: DebertaForTokenClassification (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2ForTokenClassification (DeBERTa-v2 model)
DeepseekV3Configconfiguration class:DeepseekV3ForTokenClassification(DeepSeek-V3 model)DiffLlamaConfigconfiguration class:DiffLlamaForTokenClassification(DiffLlama model)DistilBertConfigconfiguration class:DistilBertForTokenClassification(DistilBERT model)ElectraConfigconfiguration class:ElectraForTokenClassification(ELECTRA model)ErnieConfigconfiguration class:ErnieForTokenClassification(ERNIE model)ErnieMConfigconfiguration class:ErnieMForTokenClassification(ErnieM model)EsmConfigconfiguration class:EsmForTokenClassification(ESM model)Exaone4Configconfiguration class:Exaone4ForTokenClassification(EXAONE-4.0 model)FNetConfigconfiguration class:FNetForTokenClassification(FNet model)FalconConfigconfiguration class:FalconForTokenClassification(Falcon model)FlaubertConfigconfiguration class:FlaubertForTokenClassification(FlauBERT model)FunnelConfigconfiguration class:FunnelForTokenClassification(Funnel Transformer model)GPT2Configconfiguration class:GPT2ForTokenClassification(OpenAI GPT-2 model)GPTBigCodeConfigconfiguration class:GPTBigCodeForTokenClassification(GPTBigCode model)GPTNeoConfigconfiguration class:GPTNeoForTokenClassification(GPT Neo model)GPTNeoXConfigconfiguration class:GPTNeoXForTokenClassification(GPT NeoX model)Gemma2Configconfiguration class:Gemma2ForTokenClassification(Gemma2 model)GemmaConfigconfiguration class:GemmaForTokenClassification(Gemma model)Glm4Configconfiguration class:Glm4ForTokenClassification(GLM4 model)GlmConfigconfiguration class:GlmForTokenClassification(GLM model)GptOssConfigconfiguration class:GptOssForTokenClassification(GptOss model)HeliumConfigconfiguration class:HeliumForTokenClassification(Helium model)IBertConfigconfiguration class:IBertForTokenClassification(I-BERT model)LayoutLMConfigconfiguration class:LayoutLMForTokenClassification(LayoutLM model)LayoutLMv2Configconfiguration class:LayoutLMv2ForTokenClassification(LayoutLMv2 model)LayoutLMv3Configconfiguration class:LayoutLMv3ForTokenClassification(LayoutLMv3 model)LiltConfigconfiguration class:LiltForTokenClassification(LiLT model)LlamaConfigconfiguration class:LlamaForTokenClassification(LLaMA model)LongformerConfigconfiguration class:LongformerForTokenClassification(Longformer model)LukeConfigconfiguration class:LukeForTokenClassification(LUKE model)MPNetConfigconfiguration class:MPNetForTokenClassification(MPNet model)MT5Configconfiguration class:MT5ForTokenClassification(MT5 model)MarkupLMConfigconfiguration class:MarkupLMForTokenClassification(MarkupLM model)MegaConfigconfiguration class:MegaForTokenClassification(MEGA model)MegatronBertConfigconfiguration class:MegatronBertForTokenClassification(Megatron-BERT model)MiniMaxConfigconfiguration class:MiniMaxForTokenClassification(MiniMax model)MinistralConfigconfiguration class:MinistralForTokenClassification(Ministral model)MistralConfigconfiguration class:MistralForTokenClassification(Mistral model)MixtralConfigconfiguration class:MixtralForTokenClassification(Mixtral model)MobileBertConfigconfiguration class:MobileBertForTokenClassification(MobileBERT model)ModernBertConfigconfiguration class:ModernBertForTokenClassification(ModernBERT model)MptConfigconfiguration class:MptForTokenClassification(MPT model)MraConfigconfiguration class:MraForTokenClassification(MRA model)NemotronConfigconfiguration class:NemotronForTokenClassification(Nemotron model)NezhaConfigconfiguration class:NezhaForTokenClassification(Nezha model)NystromformerConfigconfiguration class:NystromformerForTokenClassification(Nyströmformer model)PersimmonConfigconfiguration class:PersimmonForTokenClassification(Persimmon model)Phi3Configconfiguration class:Phi3ForTokenClassification(Phi3 model)PhiConfigconfiguration class:PhiForTokenClassification(Phi model)QDQBertConfigconfiguration class:QDQBertForTokenClassification(QDQBert model)Qwen2Configconfiguration class:Qwen2ForTokenClassification(Qwen2 model)Qwen2MoeConfigconfiguration class:Qwen2MoeForTokenClassification(Qwen2MoE model)Qwen3Configconfiguration class:Qwen3ForTokenClassification(Qwen3 model)Qwen3MoeConfigconfiguration class:Qwen3MoeForTokenClassification(Qwen3MoE model)Qwen3NextConfigconfiguration class:Qwen3NextForTokenClassification(Qwen3Next model)RemBertConfigconfiguration class:RemBertForTokenClassification(RemBERT model)RoCBertConfigconfiguration class:RoCBertForTokenClassification(RoCBert model)RoFormerConfigconfiguration class:RoFormerForTokenClassification(RoFormer model)RobertaConfigconfiguration class:RobertaForTokenClassification(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormForTokenClassification(RoBERTa-PreLayerNorm model)SeedOssConfigconfiguration class:SeedOssForTokenClassification(SeedOss model)SmolLM3Configconfiguration class:SmolLM3ForTokenClassification(SmolLM3 model)SqueezeBertConfigconfiguration class:SqueezeBertForTokenClassification(SqueezeBERT model)StableLmConfigconfiguration class:StableLmForTokenClassification(StableLm model)Starcoder2Configconfiguration class:Starcoder2ForTokenClassification(Starcoder2 model)T5Configconfiguration class:T5ForTokenClassification(T5 model)T5GemmaConfigconfiguration class:T5GemmaForTokenClassification(T5Gemma model)UMT5Configconfiguration class:UMT5ForTokenClassification(UMT5 model)XLMConfigconfiguration class:XLMForTokenClassification(XLM model)XLMRobertaConfigconfiguration class:XLMRobertaForTokenClassification(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLForTokenClassification(XLM-RoBERTa-XL model)XLNetConfigconfiguration class:XLNetForTokenClassification(XLNet model)XmodConfigconfiguration class:XmodForTokenClassification(X-MOD model)YosoConfigconfiguration class:YosoForTokenClassification(YOSO model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a token classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a token classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — AlbertForTokenClassification (ALBERT model)
- apertus —
ApertusForTokenClassification(Apertus model) - arcee —
ArceeForTokenClassification(Arcee model) - bert — BertForTokenClassification (BERT model)
- big_bird — BigBirdForTokenClassification (BigBird model)
- biogpt — BioGptForTokenClassification (BioGpt model)
- bloom — BloomForTokenClassification (BLOOM model)
- bros — BrosForTokenClassification (BROS model)
- camembert — CamembertForTokenClassification (CamemBERT model)
- canine — CanineForTokenClassification (CANINE model)
- convbert — ConvBertForTokenClassification (ConvBERT model)
- data2vec-text — Data2VecTextForTokenClassification (Data2VecText model)
- deberta — DebertaForTokenClassification (DeBERTa model)
- deberta-v2 — DebertaV2ForTokenClassification (DeBERTa-v2 model)
- deepseek_v3 —
DeepseekV3ForTokenClassification(DeepSeek-V3 model) - diffllama —
DiffLlamaForTokenClassification(DiffLlama model) - distilbert —
DistilBertForTokenClassification(DistilBERT model) - electra —
ElectraForTokenClassification(ELECTRA model) - ernie —
ErnieForTokenClassification(ERNIE model) - ernie_m —
ErnieMForTokenClassification(ErnieM model) - esm —
EsmForTokenClassification(ESM model) - exaone4 —
Exaone4ForTokenClassification(EXAONE-4.0 model) - falcon —
FalconForTokenClassification(Falcon model) - flaubert —
FlaubertForTokenClassification(FlauBERT model) - fnet —
FNetForTokenClassification(FNet model) - funnel —
FunnelForTokenClassification(Funnel Transformer model) - gemma —
GemmaForTokenClassification(Gemma model) - gemma2 —
Gemma2ForTokenClassification(Gemma2 model) - glm —
GlmForTokenClassification(GLM model) - glm4 —
Glm4ForTokenClassification(GLM4 model) - gpt-sw3 —
GPT2ForTokenClassification(GPT-Sw3 model) - gpt2 —
GPT2ForTokenClassification(OpenAI GPT-2 model) - gpt_bigcode —
GPTBigCodeForTokenClassification(GPTBigCode model) - gpt_neo —
GPTNeoForTokenClassification(GPT Neo model) - gpt_neox —
GPTNeoXForTokenClassification(GPT NeoX model) - gpt_oss —
GptOssForTokenClassification(GptOss model) - helium —
HeliumForTokenClassification(Helium model) - ibert —
IBertForTokenClassification(I-BERT model) - layoutlm —
LayoutLMForTokenClassification(LayoutLM model) - layoutlmv2 —
LayoutLMv2ForTokenClassification(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3ForTokenClassification(LayoutLMv3 model) - lilt —
LiltForTokenClassification(LiLT model) - llama —
LlamaForTokenClassification(LLaMA model) - longformer —
LongformerForTokenClassification(Longformer model) - luke —
LukeForTokenClassification(LUKE model) - markuplm —
MarkupLMForTokenClassification(MarkupLM model) - mega —
MegaForTokenClassification(MEGA model) - megatron-bert —
MegatronBertForTokenClassification(Megatron-BERT model) - minimax —
MiniMaxForTokenClassification(MiniMax model) - ministral —
MinistralForTokenClassification(Ministral model) - mistral —
MistralForTokenClassification(Mistral model) - mixtral —
MixtralForTokenClassification(Mixtral model) - mobilebert —
MobileBertForTokenClassification(MobileBERT model) - modernbert —
ModernBertForTokenClassification(ModernBERT model) - mpnet —
MPNetForTokenClassification(MPNet model) - mpt —
MptForTokenClassification(MPT model) - mra —
MraForTokenClassification(MRA model) - mt5 —
MT5ForTokenClassification(MT5 model) - nemotron —
NemotronForTokenClassification(Nemotron model) - nezha —
NezhaForTokenClassification(Nezha model) - nystromformer —
NystromformerForTokenClassification(Nyströmformer model) - persimmon —
PersimmonForTokenClassification(Persimmon model) - phi —
PhiForTokenClassification(Phi model) - phi3 —
Phi3ForTokenClassification(Phi3 model) - qdqbert —
QDQBertForTokenClassification(QDQBert model) - qwen2 —
Qwen2ForTokenClassification(Qwen2 model) - qwen2_moe —
Qwen2MoeForTokenClassification(Qwen2MoE model) - qwen3 —
Qwen3ForTokenClassification(Qwen3 model) - qwen3_moe —
Qwen3MoeForTokenClassification(Qwen3MoE model) - qwen3_next —
Qwen3NextForTokenClassification(Qwen3Next model) - rembert —
RemBertForTokenClassification(RemBERT model) - roberta —
RobertaForTokenClassification(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormForTokenClassification(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertForTokenClassification(RoCBert model) - roformer —
RoFormerForTokenClassification(RoFormer model) - seed_oss —
SeedOssForTokenClassification(SeedOss model) - smollm3 —
SmolLM3ForTokenClassification(SmolLM3 model) - squeezebert —
SqueezeBertForTokenClassification(SqueezeBERT model) - stablelm —
StableLmForTokenClassification(StableLm model) - starcoder2 —
Starcoder2ForTokenClassification(Starcoder2 model) - t5 —
T5ForTokenClassification(T5 model) - t5gemma —
T5GemmaForTokenClassification(T5Gemma model) - umt5 —
UMT5ForTokenClassification(UMT5 model) - xlm —
XLMForTokenClassification(XLM model) - xlm-roberta —
XLMRobertaForTokenClassification(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLForTokenClassification(XLM-RoBERTa-XL model) - xlnet —
XLNetForTokenClassification(XLNet model) - xmod —
XmodForTokenClassification(X-MOD model) - yoso —
YosoForTokenClassification(YOSO model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForTokenClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForTokenClassification.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForTokenClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a token classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertForTokenClassification (ALBERT model)
- BertConfig configuration class: TFBertForTokenClassification (BERT model)
- CamembertConfig configuration class: TFCamembertForTokenClassification (CamemBERT model)
- ConvBertConfig configuration class: TFConvBertForTokenClassification (ConvBERT model)
- DebertaConfig configuration class: TFDebertaForTokenClassification (DeBERTa model)
- DebertaV2Config configuration class: TFDebertaV2ForTokenClassification (DeBERTa-v2 model)
DistilBertConfigconfiguration class:TFDistilBertForTokenClassification(DistilBERT model)ElectraConfigconfiguration class:TFElectraForTokenClassification(ELECTRA model)EsmConfigconfiguration class:TFEsmForTokenClassification(ESM model)FlaubertConfigconfiguration class:TFFlaubertForTokenClassification(FlauBERT model)FunnelConfigconfiguration class:TFFunnelForTokenClassification(Funnel Transformer model)LayoutLMConfigconfiguration class:TFLayoutLMForTokenClassification(LayoutLM model)LayoutLMv3Configconfiguration class:TFLayoutLMv3ForTokenClassification(LayoutLMv3 model)LongformerConfigconfiguration class:TFLongformerForTokenClassification(Longformer model)MPNetConfigconfiguration class:TFMPNetForTokenClassification(MPNet model)MobileBertConfigconfiguration class:TFMobileBertForTokenClassification(MobileBERT model)RemBertConfigconfiguration class:TFRemBertForTokenClassification(RemBERT model)RoFormerConfigconfiguration class:TFRoFormerForTokenClassification(RoFormer model)RobertaConfigconfiguration class:TFRobertaForTokenClassification(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormForTokenClassification(RoBERTa-PreLayerNorm model)XLMConfigconfiguration class:TFXLMForTokenClassification(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaForTokenClassification(XLM-RoBERTa model)XLNetConfigconfiguration class:TFXLNetForTokenClassification(XLNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a token classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a token classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — TFAlbertForTokenClassification (ALBERT model)
- bert — TFBertForTokenClassification (BERT model)
- camembert — TFCamembertForTokenClassification (CamemBERT model)
- convbert — TFConvBertForTokenClassification (ConvBERT model)
- deberta — TFDebertaForTokenClassification (DeBERTa model)
- deberta-v2 — TFDebertaV2ForTokenClassification (DeBERTa-v2 model)
- distilbert —
TFDistilBertForTokenClassification(DistilBERT model) - electra —
TFElectraForTokenClassification(ELECTRA model) - esm —
TFEsmForTokenClassification(ESM model) - flaubert —
TFFlaubertForTokenClassification(FlauBERT model) - funnel —
TFFunnelForTokenClassification(Funnel Transformer model) - layoutlm —
TFLayoutLMForTokenClassification(LayoutLM model) - layoutlmv3 —
TFLayoutLMv3ForTokenClassification(LayoutLMv3 model) - longformer —
TFLongformerForTokenClassification(Longformer model) - mobilebert —
TFMobileBertForTokenClassification(MobileBERT model) - mpnet —
TFMPNetForTokenClassification(MPNet model) - rembert —
TFRemBertForTokenClassification(RemBERT model) - roberta —
TFRobertaForTokenClassification(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormForTokenClassification(RoBERTa-PreLayerNorm model) - roformer —
TFRoFormerForTokenClassification(RoFormer model) - xlm —
TFXLMForTokenClassification(XLM model) - xlm-roberta —
TFXLMRobertaForTokenClassification(XLM-RoBERTa model) - xlnet —
TFXLNetForTokenClassification(XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForTokenClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForTokenClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForTokenClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a token classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertForTokenClassification (ALBERT model)
- BertConfig configuration class: FlaxBertForTokenClassification (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForTokenClassification (BigBird model)
DistilBertConfigconfiguration class:FlaxDistilBertForTokenClassification(DistilBERT model)ElectraConfigconfiguration class:FlaxElectraForTokenClassification(ELECTRA model)RoFormerConfigconfiguration class:FlaxRoFormerForTokenClassification(RoFormer model)RobertaConfigconfiguration class:FlaxRobertaForTokenClassification(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormForTokenClassification(RoBERTa-PreLayerNorm model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaForTokenClassification(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a token classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a token classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — FlaxAlbertForTokenClassification (ALBERT model)
- bert — FlaxBertForTokenClassification (BERT model)
- big_bird — FlaxBigBirdForTokenClassification (BigBird model)
- distilbert —
FlaxDistilBertForTokenClassification(DistilBERT model) - electra —
FlaxElectraForTokenClassification(ELECTRA model) - roberta —
FlaxRobertaForTokenClassification(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormForTokenClassification(RoBERTa-PreLayerNorm model) - roformer —
FlaxRoFormerForTokenClassification(RoFormer model) - xlm-roberta —
FlaxXLMRobertaForTokenClassification(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForTokenClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForTokenClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForTokenClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: AlbertForQuestionAnswering (ALBERT model)
ArceeConfigconfiguration class:ArceeForQuestionAnswering(Arcee model)- BartConfig configuration class: BartForQuestionAnswering (BART model)
- BertConfig configuration class: BertForQuestionAnswering (BERT model)
- BigBirdConfig configuration class: BigBirdForQuestionAnswering (BigBird model)
- BigBirdPegasusConfig configuration class: BigBirdPegasusForQuestionAnswering (BigBird-Pegasus model)
- BloomConfig configuration class: BloomForQuestionAnswering (BLOOM model)
- CamembertConfig configuration class: CamembertForQuestionAnswering (CamemBERT model)
- CanineConfig configuration class: CanineForQuestionAnswering (CANINE model)
- ConvBertConfig configuration class: ConvBertForQuestionAnswering (ConvBERT model)
- Data2VecTextConfig configuration class: Data2VecTextForQuestionAnswering (Data2VecText model)
- DebertaConfig configuration class: DebertaForQuestionAnswering (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2ForQuestionAnswering (DeBERTa-v2 model)
DiffLlamaConfigconfiguration class:DiffLlamaForQuestionAnswering(DiffLlama model)DistilBertConfigconfiguration class:DistilBertForQuestionAnswering(DistilBERT model)ElectraConfigconfiguration class:ElectraForQuestionAnswering(ELECTRA model)ErnieConfigconfiguration class:ErnieForQuestionAnswering(ERNIE model)ErnieMConfigconfiguration class:ErnieMForQuestionAnswering(ErnieM model)Exaone4Configconfiguration class:Exaone4ForQuestionAnswering(EXAONE-4.0 model)FNetConfigconfiguration class:FNetForQuestionAnswering(FNet model)FalconConfigconfiguration class:FalconForQuestionAnswering(Falcon model)FlaubertConfigconfiguration class:FlaubertForQuestionAnsweringSimple(FlauBERT model)FunnelConfigconfiguration class:FunnelForQuestionAnswering(Funnel Transformer model)GPT2Configconfiguration class:GPT2ForQuestionAnswering(OpenAI GPT-2 model)GPTJConfigconfiguration class:GPTJForQuestionAnswering(GPT-J model)GPTNeoConfigconfiguration class:GPTNeoForQuestionAnswering(GPT Neo model)GPTNeoXConfigconfiguration class:GPTNeoXForQuestionAnswering(GPT NeoX model)IBertConfigconfiguration class:IBertForQuestionAnswering(I-BERT model)LEDConfigconfiguration class:LEDForQuestionAnswering(LED model)LayoutLMv2Configconfiguration class:LayoutLMv2ForQuestionAnswering(LayoutLMv2 model)LayoutLMv3Configconfiguration class:LayoutLMv3ForQuestionAnswering(LayoutLMv3 model)LiltConfigconfiguration class:LiltForQuestionAnswering(LiLT model)LlamaConfigconfiguration class:LlamaForQuestionAnswering(LLaMA model)LongformerConfigconfiguration class:LongformerForQuestionAnswering(Longformer model)LukeConfigconfiguration class:LukeForQuestionAnswering(LUKE model)LxmertConfigconfiguration class:LxmertForQuestionAnswering(LXMERT model)MBartConfigconfiguration class:MBartForQuestionAnswering(mBART model)MPNetConfigconfiguration class:MPNetForQuestionAnswering(MPNet model)MT5Configconfiguration class:MT5ForQuestionAnswering(MT5 model)MarkupLMConfigconfiguration class:MarkupLMForQuestionAnswering(MarkupLM model)MegaConfigconfiguration class:MegaForQuestionAnswering(MEGA model)MegatronBertConfigconfiguration class:MegatronBertForQuestionAnswering(Megatron-BERT model)MiniMaxConfigconfiguration class:MiniMaxForQuestionAnswering(MiniMax model)MinistralConfigconfiguration class:MinistralForQuestionAnswering(Ministral model)MistralConfigconfiguration class:MistralForQuestionAnswering(Mistral model)MixtralConfigconfiguration class:MixtralForQuestionAnswering(Mixtral model)MobileBertConfigconfiguration class:MobileBertForQuestionAnswering(MobileBERT model)ModernBertConfigconfiguration class:ModernBertForQuestionAnswering(ModernBERT model)MptConfigconfiguration class:MptForQuestionAnswering(MPT model)MraConfigconfiguration class:MraForQuestionAnswering(MRA model)MvpConfigconfiguration class:MvpForQuestionAnswering(MVP model)NemotronConfigconfiguration class:NemotronForQuestionAnswering(Nemotron model)NezhaConfigconfiguration class:NezhaForQuestionAnswering(Nezha model)NystromformerConfigconfiguration class:NystromformerForQuestionAnswering(Nyströmformer model)OPTConfigconfiguration class:OPTForQuestionAnswering(OPT model)QDQBertConfigconfiguration class:QDQBertForQuestionAnswering(QDQBert model)Qwen2Configconfiguration class:Qwen2ForQuestionAnswering(Qwen2 model)Qwen2MoeConfigconfiguration class:Qwen2MoeForQuestionAnswering(Qwen2MoE model)Qwen3Configconfiguration class:Qwen3ForQuestionAnswering(Qwen3 model)Qwen3MoeConfigconfiguration class:Qwen3MoeForQuestionAnswering(Qwen3MoE model)Qwen3NextConfigconfiguration class:Qwen3NextForQuestionAnswering(Qwen3Next model)ReformerConfigconfiguration class:ReformerForQuestionAnswering(Reformer model)RemBertConfigconfiguration class:RemBertForQuestionAnswering(RemBERT model)RoCBertConfigconfiguration class:RoCBertForQuestionAnswering(RoCBert model)RoFormerConfigconfiguration class:RoFormerForQuestionAnswering(RoFormer model)RobertaConfigconfiguration class:RobertaForQuestionAnswering(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:RobertaPreLayerNormForQuestionAnswering(RoBERTa-PreLayerNorm model)SeedOssConfigconfiguration class:SeedOssForQuestionAnswering(SeedOss model)SmolLM3Configconfiguration class:SmolLM3ForQuestionAnswering(SmolLM3 model)SplinterConfigconfiguration class:SplinterForQuestionAnswering(Splinter model)SqueezeBertConfigconfiguration class:SqueezeBertForQuestionAnswering(SqueezeBERT model)T5Configconfiguration class:T5ForQuestionAnswering(T5 model)UMT5Configconfiguration class:UMT5ForQuestionAnswering(UMT5 model)XLMConfigconfiguration class:XLMForQuestionAnsweringSimple(XLM model)XLMRobertaConfigconfiguration class:XLMRobertaForQuestionAnswering(XLM-RoBERTa model)XLMRobertaXLConfigconfiguration class:XLMRobertaXLForQuestionAnswering(XLM-RoBERTa-XL model)XLNetConfigconfiguration class:XLNetForQuestionAnsweringSimple(XLNet model)XmodConfigconfiguration class:XmodForQuestionAnswering(X-MOD model)YosoConfigconfiguration class:YosoForQuestionAnswering(YOSO model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — AlbertForQuestionAnswering (ALBERT model)
- arcee —
ArceeForQuestionAnswering(Arcee model) - bart — BartForQuestionAnswering (BART model)
- bert — BertForQuestionAnswering (BERT model)
- big_bird — BigBirdForQuestionAnswering (BigBird model)
- bigbird_pegasus — BigBirdPegasusForQuestionAnswering (BigBird-Pegasus model)
- bloom — BloomForQuestionAnswering (BLOOM model)
- camembert — CamembertForQuestionAnswering (CamemBERT model)
- canine — CanineForQuestionAnswering (CANINE model)
- convbert — ConvBertForQuestionAnswering (ConvBERT model)
- data2vec-text — Data2VecTextForQuestionAnswering (Data2VecText model)
- deberta — DebertaForQuestionAnswering (DeBERTa model)
- deberta-v2 — DebertaV2ForQuestionAnswering (DeBERTa-v2 model)
- diffllama —
DiffLlamaForQuestionAnswering(DiffLlama model) - distilbert —
DistilBertForQuestionAnswering(DistilBERT model) - electra —
ElectraForQuestionAnswering(ELECTRA model) - ernie —
ErnieForQuestionAnswering(ERNIE model) - ernie_m —
ErnieMForQuestionAnswering(ErnieM model) - exaone4 —
Exaone4ForQuestionAnswering(EXAONE-4.0 model) - falcon —
FalconForQuestionAnswering(Falcon model) - flaubert —
FlaubertForQuestionAnsweringSimple(FlauBERT model) - fnet —
FNetForQuestionAnswering(FNet model) - funnel —
FunnelForQuestionAnswering(Funnel Transformer model) - gpt2 —
GPT2ForQuestionAnswering(OpenAI GPT-2 model) - gpt_neo —
GPTNeoForQuestionAnswering(GPT Neo model) - gpt_neox —
GPTNeoXForQuestionAnswering(GPT NeoX model) - gptj —
GPTJForQuestionAnswering(GPT-J model) - ibert —
IBertForQuestionAnswering(I-BERT model) - layoutlmv2 —
LayoutLMv2ForQuestionAnswering(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3ForQuestionAnswering(LayoutLMv3 model) - led —
LEDForQuestionAnswering(LED model) - lilt —
LiltForQuestionAnswering(LiLT model) - llama —
LlamaForQuestionAnswering(LLaMA model) - longformer —
LongformerForQuestionAnswering(Longformer model) - luke —
LukeForQuestionAnswering(LUKE model) - lxmert —
LxmertForQuestionAnswering(LXMERT model) - markuplm —
MarkupLMForQuestionAnswering(MarkupLM model) - mbart —
MBartForQuestionAnswering(mBART model) - mega —
MegaForQuestionAnswering(MEGA model) - megatron-bert —
MegatronBertForQuestionAnswering(Megatron-BERT model) - minimax —
MiniMaxForQuestionAnswering(MiniMax model) - ministral —
MinistralForQuestionAnswering(Ministral model) - mistral —
MistralForQuestionAnswering(Mistral model) - mixtral —
MixtralForQuestionAnswering(Mixtral model) - mobilebert —
MobileBertForQuestionAnswering(MobileBERT model) - modernbert —
ModernBertForQuestionAnswering(ModernBERT model) - mpnet —
MPNetForQuestionAnswering(MPNet model) - mpt —
MptForQuestionAnswering(MPT model) - mra —
MraForQuestionAnswering(MRA model) - mt5 —
MT5ForQuestionAnswering(MT5 model) - mvp —
MvpForQuestionAnswering(MVP model) - nemotron —
NemotronForQuestionAnswering(Nemotron model) - nezha —
NezhaForQuestionAnswering(Nezha model) - nystromformer —
NystromformerForQuestionAnswering(Nyströmformer model) - opt —
OPTForQuestionAnswering(OPT model) - qdqbert —
QDQBertForQuestionAnswering(QDQBert model) - qwen2 —
Qwen2ForQuestionAnswering(Qwen2 model) - qwen2_moe —
Qwen2MoeForQuestionAnswering(Qwen2MoE model) - qwen3 —
Qwen3ForQuestionAnswering(Qwen3 model) - qwen3_moe —
Qwen3MoeForQuestionAnswering(Qwen3MoE model) - qwen3_next —
Qwen3NextForQuestionAnswering(Qwen3Next model) - reformer —
ReformerForQuestionAnswering(Reformer model) - rembert —
RemBertForQuestionAnswering(RemBERT model) - roberta —
RobertaForQuestionAnswering(RoBERTa model) - roberta-prelayernorm —
RobertaPreLayerNormForQuestionAnswering(RoBERTa-PreLayerNorm model) - roc_bert —
RoCBertForQuestionAnswering(RoCBert model) - roformer —
RoFormerForQuestionAnswering(RoFormer model) - seed_oss —
SeedOssForQuestionAnswering(SeedOss model) - smollm3 —
SmolLM3ForQuestionAnswering(SmolLM3 model) - splinter —
SplinterForQuestionAnswering(Splinter model) - squeezebert —
SqueezeBertForQuestionAnswering(SqueezeBERT model) - t5 —
T5ForQuestionAnswering(T5 model) - umt5 —
UMT5ForQuestionAnswering(UMT5 model) - xlm —
XLMForQuestionAnsweringSimple(XLM model) - xlm-roberta —
XLMRobertaForQuestionAnswering(XLM-RoBERTa model) - xlm-roberta-xl —
XLMRobertaXLForQuestionAnswering(XLM-RoBERTa-XL model) - xlnet —
XLNetForQuestionAnsweringSimple(XLNet model) - xmod —
XmodForQuestionAnswering(X-MOD model) - yoso —
YosoForQuestionAnswering(YOSO model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForQuestionAnswering.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertForQuestionAnswering (ALBERT model)
- BertConfig configuration class: TFBertForQuestionAnswering (BERT model)
- CamembertConfig configuration class: TFCamembertForQuestionAnswering (CamemBERT model)
- ConvBertConfig configuration class: TFConvBertForQuestionAnswering (ConvBERT model)
- DebertaConfig configuration class: TFDebertaForQuestionAnswering (DeBERTa model)
- DebertaV2Config configuration class: TFDebertaV2ForQuestionAnswering (DeBERTa-v2 model)
DistilBertConfigconfiguration class:TFDistilBertForQuestionAnswering(DistilBERT model)ElectraConfigconfiguration class:TFElectraForQuestionAnswering(ELECTRA model)FlaubertConfigconfiguration class:TFFlaubertForQuestionAnsweringSimple(FlauBERT model)FunnelConfigconfiguration class:TFFunnelForQuestionAnswering(Funnel Transformer model)GPTJConfigconfiguration class:TFGPTJForQuestionAnswering(GPT-J model)LayoutLMv3Configconfiguration class:TFLayoutLMv3ForQuestionAnswering(LayoutLMv3 model)LongformerConfigconfiguration class:TFLongformerForQuestionAnswering(Longformer model)MPNetConfigconfiguration class:TFMPNetForQuestionAnswering(MPNet model)MobileBertConfigconfiguration class:TFMobileBertForQuestionAnswering(MobileBERT model)RemBertConfigconfiguration class:TFRemBertForQuestionAnswering(RemBERT model)RoFormerConfigconfiguration class:TFRoFormerForQuestionAnswering(RoFormer model)RobertaConfigconfiguration class:TFRobertaForQuestionAnswering(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:TFRobertaPreLayerNormForQuestionAnswering(RoBERTa-PreLayerNorm model)XLMConfigconfiguration class:TFXLMForQuestionAnsweringSimple(XLM model)XLMRobertaConfigconfiguration class:TFXLMRobertaForQuestionAnswering(XLM-RoBERTa model)XLNetConfigconfiguration class:TFXLNetForQuestionAnsweringSimple(XLNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — TFAlbertForQuestionAnswering (ALBERT model)
- bert — TFBertForQuestionAnswering (BERT model)
- camembert — TFCamembertForQuestionAnswering (CamemBERT model)
- convbert — TFConvBertForQuestionAnswering (ConvBERT model)
- deberta — TFDebertaForQuestionAnswering (DeBERTa model)
- deberta-v2 — TFDebertaV2ForQuestionAnswering (DeBERTa-v2 model)
- distilbert —
TFDistilBertForQuestionAnswering(DistilBERT model) - electra —
TFElectraForQuestionAnswering(ELECTRA model) - flaubert —
TFFlaubertForQuestionAnsweringSimple(FlauBERT model) - funnel —
TFFunnelForQuestionAnswering(Funnel Transformer model) - gptj —
TFGPTJForQuestionAnswering(GPT-J model) - layoutlmv3 —
TFLayoutLMv3ForQuestionAnswering(LayoutLMv3 model) - longformer —
TFLongformerForQuestionAnswering(Longformer model) - mobilebert —
TFMobileBertForQuestionAnswering(MobileBERT model) - mpnet —
TFMPNetForQuestionAnswering(MPNet model) - rembert —
TFRemBertForQuestionAnswering(RemBERT model) - roberta —
TFRobertaForQuestionAnswering(RoBERTa model) - roberta-prelayernorm —
TFRobertaPreLayerNormForQuestionAnswering(RoBERTa-PreLayerNorm model) - roformer —
TFRoFormerForQuestionAnswering(RoFormer model) - xlm —
TFXLMForQuestionAnsweringSimple(XLM model) - xlm-roberta —
TFXLMRobertaForQuestionAnswering(XLM-RoBERTa model) - xlnet —
TFXLNetForQuestionAnsweringSimple(XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForQuestionAnswering.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertForQuestionAnswering (ALBERT model)
- BartConfig configuration class: FlaxBartForQuestionAnswering (BART model)
- BertConfig configuration class: FlaxBertForQuestionAnswering (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForQuestionAnswering (BigBird model)
DistilBertConfigconfiguration class:FlaxDistilBertForQuestionAnswering(DistilBERT model)ElectraConfigconfiguration class:FlaxElectraForQuestionAnswering(ELECTRA model)MBartConfigconfiguration class:FlaxMBartForQuestionAnswering(mBART model)RoFormerConfigconfiguration class:FlaxRoFormerForQuestionAnswering(RoFormer model)RobertaConfigconfiguration class:FlaxRobertaForQuestionAnswering(RoBERTa model)RobertaPreLayerNormConfigconfiguration class:FlaxRobertaPreLayerNormForQuestionAnswering(RoBERTa-PreLayerNorm model)XLMRobertaConfigconfiguration class:FlaxXLMRobertaForQuestionAnswering(XLM-RoBERTa model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- albert — FlaxAlbertForQuestionAnswering (ALBERT model)
- bart — FlaxBartForQuestionAnswering (BART model)
- bert — FlaxBertForQuestionAnswering (BERT model)
- big_bird — FlaxBigBirdForQuestionAnswering (BigBird model)
- distilbert —
FlaxDistilBertForQuestionAnswering(DistilBERT model) - electra —
FlaxElectraForQuestionAnswering(ELECTRA model) - mbart —
FlaxMBartForQuestionAnswering(mBART model) - roberta —
FlaxRobertaForQuestionAnswering(RoBERTa model) - roberta-prelayernorm —
FlaxRobertaPreLayerNormForQuestionAnswering(RoBERTa-PreLayerNorm model) - roformer —
FlaxRoFormerForQuestionAnswering(RoFormer model) - xlm-roberta —
FlaxXLMRobertaForQuestionAnswering(XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForQuestionAnswering.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForQuestionAnswering.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForTextEncoding
TFAutoModelForTextEncoding
Computer vision
以下の自動クラスは、次のコンピュータービジョンタスクに利用可能です。
AutoModelForDepthEstimation
This is a generic model class that will be instantiated as one of the model classes of the library (with a depth estimation head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
DPTConfigconfiguration class:DPTForDepthEstimation(DPT model)DepthAnythingConfigconfiguration class:DepthAnythingForDepthEstimation(Depth Anything model)DepthProConfigconfiguration class:DepthProForDepthEstimation(DepthPro model)GLPNConfigconfiguration class:GLPNForDepthEstimation(GLPN model)PromptDepthAnythingConfigconfiguration class:PromptDepthAnythingForDepthEstimation(PromptDepthAnything model)ZoeDepthConfigconfiguration class:ZoeDepthForDepthEstimation(ZoeDepth model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a depth estimation head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a depth estimation head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- depth_anything —
DepthAnythingForDepthEstimation(Depth Anything model) - depth_pro —
DepthProForDepthEstimation(DepthPro model) - dpt —
DPTForDepthEstimation(DPT model) - glpn —
GLPNForDepthEstimation(GLPN model) - prompt_depth_anything —
PromptDepthAnythingForDepthEstimation(PromptDepthAnything model) - zoedepth —
ZoeDepthForDepthEstimation(ZoeDepth model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForDepthEstimation
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForDepthEstimation.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForDepthEstimation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForDepthEstimation.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForImageClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a image classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BeitConfig configuration class: BeitForImageClassification (BEiT model)
- BitConfig configuration class: BitForImageClassification (BiT model)
- CLIPConfig configuration class:
CLIPForImageClassification(CLIP model) - ConvNextConfig configuration class: ConvNextForImageClassification (ConvNeXT model)
- ConvNextV2Config configuration class: ConvNextV2ForImageClassification (ConvNeXTV2 model)
- CvtConfig configuration class: CvtForImageClassification (CvT model)
- Data2VecVisionConfig configuration class: Data2VecVisionForImageClassification (Data2VecVision model)
- DeiTConfig configuration class: DeiTForImageClassification or DeiTForImageClassificationWithTeacher (DeiT model)
- DinatConfig configuration class: DinatForImageClassification (DiNAT model)
Dinov2Configconfiguration class:Dinov2ForImageClassification(DINOv2 model)Dinov2WithRegistersConfigconfiguration class:Dinov2WithRegistersForImageClassification(DINOv2 with Registers model)DonutSwinConfigconfiguration class:DonutSwinForImageClassification(DonutSwin model)EfficientFormerConfigconfiguration class:EfficientFormerForImageClassificationorEfficientFormerForImageClassificationWithTeacher(EfficientFormer model)EfficientNetConfigconfiguration class:EfficientNetForImageClassification(EfficientNet model)FocalNetConfigconfiguration class:FocalNetForImageClassification(FocalNet model)HGNetV2Configconfiguration class:HGNetV2ForImageClassification(HGNet-V2 model)HieraConfigconfiguration class:HieraForImageClassification(Hiera model)IJepaConfigconfiguration class:IJepaForImageClassification(I-JEPA model)ImageGPTConfigconfiguration class:ImageGPTForImageClassification(ImageGPT model)LevitConfigconfiguration class:LevitForImageClassificationorLevitForImageClassificationWithTeacher(LeViT model)MetaClip2Configconfiguration class:MetaClip2ForImageClassification(MetaCLIP 2 model)MobileNetV1Configconfiguration class:MobileNetV1ForImageClassification(MobileNetV1 model)MobileNetV2Configconfiguration class:MobileNetV2ForImageClassification(MobileNetV2 model)MobileViTConfigconfiguration class:MobileViTForImageClassification(MobileViT model)MobileViTV2Configconfiguration class:MobileViTV2ForImageClassification(MobileViTV2 model)NatConfigconfiguration class:NatForImageClassification(NAT model)PerceiverConfigconfiguration class:PerceiverForImageClassificationLearnedorPerceiverForImageClassificationFourierorPerceiverForImageClassificationConvProcessing(Perceiver model)PoolFormerConfigconfiguration class:PoolFormerForImageClassification(PoolFormer model)PvtConfigconfiguration class:PvtForImageClassification(PVT model)PvtV2Configconfiguration class:PvtV2ForImageClassification(PVTv2 model)RegNetConfigconfiguration class:RegNetForImageClassification(RegNet model)ResNetConfigconfiguration class:ResNetForImageClassification(ResNet model)SegformerConfigconfiguration class:SegformerForImageClassification(SegFormer model)ShieldGemma2Configconfiguration class:ShieldGemma2ForImageClassification(Shieldgemma2 model)Siglip2Configconfiguration class:Siglip2ForImageClassification(SigLIP2 model)SiglipConfigconfiguration class:SiglipForImageClassification(SigLIP model)SwiftFormerConfigconfiguration class:SwiftFormerForImageClassification(SwiftFormer model)SwinConfigconfiguration class:SwinForImageClassification(Swin Transformer model)Swinv2Configconfiguration class:Swinv2ForImageClassification(Swin Transformer V2 model)TextNetConfigconfiguration class:TextNetForImageClassification(TextNet model)TimmWrapperConfigconfiguration class:TimmWrapperForImageClassification(TimmWrapperModel model)VanConfigconfiguration class:VanForImageClassification(VAN model)ViTConfigconfiguration class:ViTForImageClassification(ViT model)ViTHybridConfigconfiguration class:ViTHybridForImageClassification(ViT Hybrid model)ViTMSNConfigconfiguration class:ViTMSNForImageClassification(ViTMSN model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a image classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a image classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- beit — BeitForImageClassification (BEiT model)
- bit — BitForImageClassification (BiT model)
- clip —
CLIPForImageClassification(CLIP model) - convnext — ConvNextForImageClassification (ConvNeXT model)
- convnextv2 — ConvNextV2ForImageClassification (ConvNeXTV2 model)
- cvt — CvtForImageClassification (CvT model)
- data2vec-vision — Data2VecVisionForImageClassification (Data2VecVision model)
- deit — DeiTForImageClassification or DeiTForImageClassificationWithTeacher (DeiT model)
- dinat — DinatForImageClassification (DiNAT model)
- dinov2 —
Dinov2ForImageClassification(DINOv2 model) - dinov2_with_registers —
Dinov2WithRegistersForImageClassification(DINOv2 with Registers model) - donut-swin —
DonutSwinForImageClassification(DonutSwin model) - efficientformer —
EfficientFormerForImageClassificationorEfficientFormerForImageClassificationWithTeacher(EfficientFormer model) - efficientnet —
EfficientNetForImageClassification(EfficientNet model) - focalnet —
FocalNetForImageClassification(FocalNet model) - hgnet_v2 —
HGNetV2ForImageClassification(HGNet-V2 model) - hiera —
HieraForImageClassification(Hiera model) - ijepa —
IJepaForImageClassification(I-JEPA model) - imagegpt —
ImageGPTForImageClassification(ImageGPT model) - levit —
LevitForImageClassificationorLevitForImageClassificationWithTeacher(LeViT model) - metaclip_2 —
MetaClip2ForImageClassification(MetaCLIP 2 model) - mobilenet_v1 —
MobileNetV1ForImageClassification(MobileNetV1 model) - mobilenet_v2 —
MobileNetV2ForImageClassification(MobileNetV2 model) - mobilevit —
MobileViTForImageClassification(MobileViT model) - mobilevitv2 —
MobileViTV2ForImageClassification(MobileViTV2 model) - nat —
NatForImageClassification(NAT model) - perceiver —
PerceiverForImageClassificationLearnedorPerceiverForImageClassificationFourierorPerceiverForImageClassificationConvProcessing(Perceiver model) - poolformer —
PoolFormerForImageClassification(PoolFormer model) - pvt —
PvtForImageClassification(PVT model) - pvt_v2 —
PvtV2ForImageClassification(PVTv2 model) - regnet —
RegNetForImageClassification(RegNet model) - resnet —
ResNetForImageClassification(ResNet model) - segformer —
SegformerForImageClassification(SegFormer model) - shieldgemma2 —
ShieldGemma2ForImageClassification(Shieldgemma2 model) - siglip —
SiglipForImageClassification(SigLIP model) - siglip2 —
Siglip2ForImageClassification(SigLIP2 model) - swiftformer —
SwiftFormerForImageClassification(SwiftFormer model) - swin —
SwinForImageClassification(Swin Transformer model) - swinv2 —
Swinv2ForImageClassification(Swin Transformer V2 model) - textnet —
TextNetForImageClassification(TextNet model) - timm_wrapper —
TimmWrapperForImageClassification(TimmWrapperModel model) - van —
VanForImageClassification(VAN model) - vit —
ViTForImageClassification(ViT model) - vit_hybrid —
ViTHybridForImageClassification(ViT Hybrid model) - vit_msn —
ViTMSNForImageClassification(ViTMSN model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForImageClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForImageClassification.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForImageClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a image classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- ConvNextConfig configuration class: TFConvNextForImageClassification (ConvNeXT model)
- ConvNextV2Config configuration class: TFConvNextV2ForImageClassification (ConvNeXTV2 model)
- CvtConfig configuration class: TFCvtForImageClassification (CvT model)
- Data2VecVisionConfig configuration class: TFData2VecVisionForImageClassification (Data2VecVision model)
- DeiTConfig configuration class: TFDeiTForImageClassification or TFDeiTForImageClassificationWithTeacher (DeiT model)
EfficientFormerConfigconfiguration class:TFEfficientFormerForImageClassificationorTFEfficientFormerForImageClassificationWithTeacher(EfficientFormer model)MobileViTConfigconfiguration class:TFMobileViTForImageClassification(MobileViT model)RegNetConfigconfiguration class:TFRegNetForImageClassification(RegNet model)ResNetConfigconfiguration class:TFResNetForImageClassification(ResNet model)SegformerConfigconfiguration class:TFSegformerForImageClassification(SegFormer model)SwiftFormerConfigconfiguration class:TFSwiftFormerForImageClassification(SwiftFormer model)SwinConfigconfiguration class:TFSwinForImageClassification(Swin Transformer model)ViTConfigconfiguration class:TFViTForImageClassification(ViT model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a image classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a image classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- convnext — TFConvNextForImageClassification (ConvNeXT model)
- convnextv2 — TFConvNextV2ForImageClassification (ConvNeXTV2 model)
- cvt — TFCvtForImageClassification (CvT model)
- data2vec-vision — TFData2VecVisionForImageClassification (Data2VecVision model)
- deit — TFDeiTForImageClassification or TFDeiTForImageClassificationWithTeacher (DeiT model)
- efficientformer —
TFEfficientFormerForImageClassificationorTFEfficientFormerForImageClassificationWithTeacher(EfficientFormer model) - mobilevit —
TFMobileViTForImageClassification(MobileViT model) - regnet —
TFRegNetForImageClassification(RegNet model) - resnet —
TFResNetForImageClassification(ResNet model) - segformer —
TFSegformerForImageClassification(SegFormer model) - swiftformer —
TFSwiftFormerForImageClassification(SwiftFormer model) - swin —
TFSwinForImageClassification(Swin Transformer model) - vit —
TFViTForImageClassification(ViT model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForImageClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForImageClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForImageClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a image classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BeitConfig configuration class: FlaxBeitForImageClassification (BEiT model)
Dinov2Configconfiguration class:FlaxDinov2ForImageClassification(DINOv2 model)RegNetConfigconfiguration class:FlaxRegNetForImageClassification(RegNet model)ResNetConfigconfiguration class:FlaxResNetForImageClassification(ResNet model)ViTConfigconfiguration class:FlaxViTForImageClassification(ViT model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a image classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a image classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- beit — FlaxBeitForImageClassification (BEiT model)
- dinov2 —
FlaxDinov2ForImageClassification(DINOv2 model) - regnet —
FlaxRegNetForImageClassification(RegNet model) - resnet —
FlaxResNetForImageClassification(ResNet model) - vit —
FlaxViTForImageClassification(ViT model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForImageClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForImageClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForVideoClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a video classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
TimesformerConfigconfiguration class:TimesformerForVideoClassification(TimeSformer model)VJEPA2Configconfiguration class:VJEPA2ForVideoClassification(VJEPA2Model model)VideoMAEConfigconfiguration class:VideoMAEForVideoClassification(VideoMAE model)VivitConfigconfiguration class:VivitForVideoClassification(ViViT model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a video classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a video classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- timesformer —
TimesformerForVideoClassification(TimeSformer model) - videomae —
VideoMAEForVideoClassification(VideoMAE model) - vivit —
VivitForVideoClassification(ViViT model) - vjepa2 —
VJEPA2ForVideoClassification(VJEPA2Model model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForVideoClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForVideoClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForVideoClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForVideoClassification.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForMaskedImageModeling
This is a generic model class that will be instantiated as one of the model classes of the library (with a masked image modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- DeiTConfig configuration class: DeiTForMaskedImageModeling (DeiT model)
FocalNetConfigconfiguration class:FocalNetForMaskedImageModeling(FocalNet model)SwinConfigconfiguration class:SwinForMaskedImageModeling(Swin Transformer model)Swinv2Configconfiguration class:Swinv2ForMaskedImageModeling(Swin Transformer V2 model)ViTConfigconfiguration class:ViTForMaskedImageModeling(ViT model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a masked image modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a masked image modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- deit — DeiTForMaskedImageModeling (DeiT model)
- focalnet —
FocalNetForMaskedImageModeling(FocalNet model) - swin —
SwinForMaskedImageModeling(Swin Transformer model) - swinv2 —
Swinv2ForMaskedImageModeling(Swin Transformer V2 model) - vit —
ViTForMaskedImageModeling(ViT model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForMaskedImageModeling
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForMaskedImageModeling.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForMaskedImageModeling
This is a generic model class that will be instantiated as one of the model classes of the library (with a masked image modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- DeiTConfig configuration class: TFDeiTForMaskedImageModeling (DeiT model)
SwinConfigconfiguration class:TFSwinForMaskedImageModeling(Swin Transformer model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a masked image modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a masked image modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- deit — TFDeiTForMaskedImageModeling (DeiT model)
- swin —
TFSwinForMaskedImageModeling(Swin Transformer model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForMaskedImageModeling
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForMaskedImageModeling.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForMaskedImageModeling.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForObjectDetection
This is a generic model class that will be instantiated as one of the model classes of the library (with a object detection head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- ConditionalDetrConfig configuration class: ConditionalDetrForObjectDetection (Conditional DETR model)
DFineConfigconfiguration class:DFineForObjectDetection(D-FINE model)DabDetrConfigconfiguration class:DabDetrForObjectDetection(DAB-DETR model)- DeformableDetrConfig configuration class: DeformableDetrForObjectDetection (Deformable DETR model)
- DetaConfig configuration class: DetaForObjectDetection (DETA model)
- DetrConfig configuration class: DetrForObjectDetection (DETR model)
RTDetrConfigconfiguration class:RTDetrForObjectDetection(RT-DETR model)RTDetrV2Configconfiguration class:RTDetrV2ForObjectDetection(RT-DETRv2 model)TableTransformerConfigconfiguration class:TableTransformerForObjectDetection(Table Transformer model)YolosConfigconfiguration class:YolosForObjectDetection(YOLOS model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a object detection head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a object detection head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- conditional_detr — ConditionalDetrForObjectDetection (Conditional DETR model)
- d_fine —
DFineForObjectDetection(D-FINE model) - dab-detr —
DabDetrForObjectDetection(DAB-DETR model) - deformable_detr — DeformableDetrForObjectDetection (Deformable DETR model)
- deta — DetaForObjectDetection (DETA model)
- detr — DetrForObjectDetection (DETR model)
- rt_detr —
RTDetrForObjectDetection(RT-DETR model) - rt_detr_v2 —
RTDetrV2ForObjectDetection(RT-DETRv2 model) - table-transformer —
TableTransformerForObjectDetection(Table Transformer model) - yolos —
YolosForObjectDetection(YOLOS model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForObjectDetection
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForObjectDetection.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForObjectDetection.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForObjectDetection.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForImageSegmentation
This is a generic model class that will be instantiated as one of the model classes of the library (with a image segmentation head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- DetrConfig configuration class: DetrForSegmentation (DETR model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a image segmentation head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a image segmentation head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- detr — DetrForSegmentation (DETR model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForImageSegmentation
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForImageSegmentation.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForImageSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForImageSegmentation.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForImageToImage
AutoModelForSemanticSegmentation
This is a generic model class that will be instantiated as one of the model classes of the library (with a semantic segmentation head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BeitConfig configuration class: BeitForSemanticSegmentation (BEiT model)
DPTConfigconfiguration class:DPTForSemanticSegmentation(DPT model)- Data2VecVisionConfig configuration class: Data2VecVisionForSemanticSegmentation (Data2VecVision model)
MobileNetV2Configconfiguration class:MobileNetV2ForSemanticSegmentation(MobileNetV2 model)MobileViTConfigconfiguration class:MobileViTForSemanticSegmentation(MobileViT model)MobileViTV2Configconfiguration class:MobileViTV2ForSemanticSegmentation(MobileViTV2 model)SegformerConfigconfiguration class:SegformerForSemanticSegmentation(SegFormer model)UperNetConfigconfiguration class:UperNetForSemanticSegmentation(UPerNet model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a semantic segmentation head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a semantic segmentation head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- beit — BeitForSemanticSegmentation (BEiT model)
- data2vec-vision — Data2VecVisionForSemanticSegmentation (Data2VecVision model)
- dpt —
DPTForSemanticSegmentation(DPT model) - mobilenet_v2 —
MobileNetV2ForSemanticSegmentation(MobileNetV2 model) - mobilevit —
MobileViTForSemanticSegmentation(MobileViT model) - mobilevitv2 —
MobileViTV2ForSemanticSegmentation(MobileViTV2 model) - segformer —
SegformerForSemanticSegmentation(SegFormer model) - upernet —
UperNetForSemanticSegmentation(UPerNet model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForSemanticSegmentation
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForSemanticSegmentation.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForSemanticSegmentation
This is a generic model class that will be instantiated as one of the model classes of the library (with a semantic segmentation head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- Data2VecVisionConfig configuration class: TFData2VecVisionForSemanticSegmentation (Data2VecVision model)
MobileViTConfigconfiguration class:TFMobileViTForSemanticSegmentation(MobileViT model)SegformerConfigconfiguration class:TFSegformerForSemanticSegmentation(SegFormer model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a semantic segmentation head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a semantic segmentation head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- data2vec-vision — TFData2VecVisionForSemanticSegmentation (Data2VecVision model)
- mobilevit —
TFMobileViTForSemanticSegmentation(MobileViT model) - segformer —
TFSegformerForSemanticSegmentation(SegFormer model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForSemanticSegmentation
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForInstanceSegmentation
This is a generic model class that will be instantiated as one of the model classes of the library (with a instance segmentation head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
MaskFormerConfigconfiguration class:MaskFormerForInstanceSegmentation(MaskFormer model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a instance segmentation head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a instance segmentation head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- maskformer —
MaskFormerForInstanceSegmentation(MaskFormer model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForInstanceSegmentation
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForInstanceSegmentation.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForInstanceSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForInstanceSegmentation.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForUniversalSegmentation
This is a generic model class that will be instantiated as one of the model classes of the library (with a universal image segmentation head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- DetrConfig configuration class: DetrForSegmentation (DETR model)
EomtConfigconfiguration class:EomtForUniversalSegmentation(EoMT model)Mask2FormerConfigconfiguration class:Mask2FormerForUniversalSegmentation(Mask2Former model)MaskFormerConfigconfiguration class:MaskFormerForInstanceSegmentation(MaskFormer model)OneFormerConfigconfiguration class:OneFormerForUniversalSegmentation(OneFormer model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a universal image segmentation head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a universal image segmentation head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- detr — DetrForSegmentation (DETR model)
- eomt —
EomtForUniversalSegmentation(EoMT model) - mask2former —
Mask2FormerForUniversalSegmentation(Mask2Former model) - maskformer —
MaskFormerForInstanceSegmentation(MaskFormer model) - oneformer —
OneFormerForUniversalSegmentation(OneFormer model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForUniversalSegmentation
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForUniversalSegmentation.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForUniversalSegmentation.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForUniversalSegmentation.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForZeroShotImageClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a zero-shot image classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlignConfig configuration class: AlignModel (ALIGN model)
- AltCLIPConfig configuration class: AltCLIPModel (AltCLIP model)
- Blip2Config configuration class:
Blip2ForImageTextRetrieval(BLIP-2 model) - BlipConfig configuration class: BlipModel (BLIP model)
- CLIPConfig configuration class: CLIPModel (CLIP model)
- CLIPSegConfig configuration class: CLIPSegModel (CLIPSeg model)
- ChineseCLIPConfig configuration class: ChineseCLIPModel (Chinese-CLIP model)
MetaClip2Configconfiguration class:MetaClip2Model(MetaCLIP 2 model)Siglip2Configconfiguration class:Siglip2Model(SigLIP2 model)SiglipConfigconfiguration class:SiglipModel(SigLIP model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a zero-shot image classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a zero-shot image classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- align — AlignModel (ALIGN model)
- altclip — AltCLIPModel (AltCLIP model)
- blip — BlipModel (BLIP model)
- blip-2 —
Blip2ForImageTextRetrieval(BLIP-2 model) - chinese_clip — ChineseCLIPModel (Chinese-CLIP model)
- clip — CLIPModel (CLIP model)
- clipseg — CLIPSegModel (CLIPSeg model)
- metaclip_2 —
MetaClip2Model(MetaCLIP 2 model) - siglip —
SiglipModel(SigLIP model) - siglip2 —
Siglip2Model(SigLIP2 model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForZeroShotImageClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForZeroShotImageClassification.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForZeroShotImageClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a zero-shot image classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BlipConfig configuration class: TFBlipModel (BLIP model)
- CLIPConfig configuration class: TFCLIPModel (CLIP model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a zero-shot image classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a zero-shot image classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- blip — TFBlipModel (BLIP model)
- clip — TFCLIPModel (CLIP model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForZeroShotImageClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForZeroShotImageClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForZeroShotImageClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForZeroShotObjectDetection
This is a generic model class that will be instantiated as one of the model classes of the library (with a zero-shot object detection head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
GroundingDinoConfigconfiguration class:GroundingDinoForObjectDetection(Grounding DINO model)MMGroundingDinoConfigconfiguration class:MMGroundingDinoForObjectDetection(MM Grounding DINO model)OmDetTurboConfigconfiguration class:OmDetTurboForObjectDetection(OmDet-Turbo model)OwlViTConfigconfiguration class:OwlViTForObjectDetection(OWL-ViT model)Owlv2Configconfiguration class:Owlv2ForObjectDetection(OWLv2 model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a zero-shot object detection head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a zero-shot object detection head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- grounding-dino —
GroundingDinoForObjectDetection(Grounding DINO model) - mm-grounding-dino —
MMGroundingDinoForObjectDetection(MM Grounding DINO model) - omdet-turbo —
OmDetTurboForObjectDetection(OmDet-Turbo model) - owlv2 —
Owlv2ForObjectDetection(OWLv2 model) - owlvit —
OwlViTForObjectDetection(OWL-ViT model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForZeroShotObjectDetection
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForZeroShotObjectDetection.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForZeroShotObjectDetection.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForZeroShotObjectDetection.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )Audio
以下の自動クラスは、次の音声タスクに利用可能です。
AutoModelForAudioClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a audio classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- ASTConfig configuration class: ASTForAudioClassification (Audio Spectrogram Transformer model)
- Data2VecAudioConfig configuration class: Data2VecAudioForSequenceClassification (Data2VecAudio model)
HubertConfigconfiguration class:HubertForSequenceClassification(Hubert model)SEWConfigconfiguration class:SEWForSequenceClassification(SEW model)SEWDConfigconfiguration class:SEWDForSequenceClassification(SEW-D model)UniSpeechConfigconfiguration class:UniSpeechForSequenceClassification(UniSpeech model)UniSpeechSatConfigconfiguration class:UniSpeechSatForSequenceClassification(UniSpeechSat model)Wav2Vec2BertConfigconfiguration class:Wav2Vec2BertForSequenceClassification(Wav2Vec2-BERT model)Wav2Vec2Configconfiguration class:Wav2Vec2ForSequenceClassification(Wav2Vec2 model)Wav2Vec2ConformerConfigconfiguration class:Wav2Vec2ConformerForSequenceClassification(Wav2Vec2-Conformer model)WavLMConfigconfiguration class:WavLMForSequenceClassification(WavLM model)WhisperConfigconfiguration class:WhisperForAudioClassification(Whisper model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a audio classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a audio classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- audio-spectrogram-transformer — ASTForAudioClassification (Audio Spectrogram Transformer model)
- data2vec-audio — Data2VecAudioForSequenceClassification (Data2VecAudio model)
- hubert —
HubertForSequenceClassification(Hubert model) - sew —
SEWForSequenceClassification(SEW model) - sew-d —
SEWDForSequenceClassification(SEW-D model) - unispeech —
UniSpeechForSequenceClassification(UniSpeech model) - unispeech-sat —
UniSpeechSatForSequenceClassification(UniSpeechSat model) - wav2vec2 —
Wav2Vec2ForSequenceClassification(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2BertForSequenceClassification(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2ConformerForSequenceClassification(Wav2Vec2-Conformer model) - wavlm —
WavLMForSequenceClassification(WavLM model) - whisper —
WhisperForAudioClassification(Whisper model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForAudioClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForAudioClassification.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForAudioFrameClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a audio classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
Wav2Vec2Configconfiguration class:TFWav2Vec2ForSequenceClassification(Wav2Vec2 model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a audio classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a audio classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- wav2vec2 —
TFWav2Vec2ForSequenceClassification(Wav2Vec2 model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForAudioClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForAudioClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForAudioClassification.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )TFAutoModelForAudioFrameClassification
This is a generic model class that will be instantiated as one of the model classes of the library (with a audio frame (token) classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- Data2VecAudioConfig configuration class: Data2VecAudioForAudioFrameClassification (Data2VecAudio model)
UniSpeechSatConfigconfiguration class:UniSpeechSatForAudioFrameClassification(UniSpeechSat model)Wav2Vec2BertConfigconfiguration class:Wav2Vec2BertForAudioFrameClassification(Wav2Vec2-BERT model)Wav2Vec2Configconfiguration class:Wav2Vec2ForAudioFrameClassification(Wav2Vec2 model)Wav2Vec2ConformerConfigconfiguration class:Wav2Vec2ConformerForAudioFrameClassification(Wav2Vec2-Conformer model)WavLMConfigconfiguration class:WavLMForAudioFrameClassification(WavLM model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a audio frame (token) classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a audio frame (token) classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- data2vec-audio — Data2VecAudioForAudioFrameClassification (Data2VecAudio model)
- unispeech-sat —
UniSpeechSatForAudioFrameClassification(UniSpeechSat model) - wav2vec2 —
Wav2Vec2ForAudioFrameClassification(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2BertForAudioFrameClassification(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2ConformerForAudioFrameClassification(Wav2Vec2-Conformer model) - wavlm —
WavLMForAudioFrameClassification(WavLM model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForAudioFrameClassification
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForAudioFrameClassification.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForAudioFrameClassification.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForAudioFrameClassification.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForCTC
This is a generic model class that will be instantiated as one of the model classes of the library (with a connectionist temporal classification head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- Data2VecAudioConfig configuration class: Data2VecAudioForCTC (Data2VecAudio model)
HubertConfigconfiguration class:HubertForCTC(Hubert model)MCTCTConfigconfiguration class:MCTCTForCTC(M-CTC-T model)ParakeetCTCConfigconfiguration class:ParakeetForCTC(Parakeet model)SEWConfigconfiguration class:SEWForCTC(SEW model)SEWDConfigconfiguration class:SEWDForCTC(SEW-D model)UniSpeechConfigconfiguration class:UniSpeechForCTC(UniSpeech model)UniSpeechSatConfigconfiguration class:UniSpeechSatForCTC(UniSpeechSat model)Wav2Vec2BertConfigconfiguration class:Wav2Vec2BertForCTC(Wav2Vec2-BERT model)Wav2Vec2Configconfiguration class:Wav2Vec2ForCTC(Wav2Vec2 model)Wav2Vec2ConformerConfigconfiguration class:Wav2Vec2ConformerForCTC(Wav2Vec2-Conformer model)WavLMConfigconfiguration class:WavLMForCTC(WavLM model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a connectionist temporal classification head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a connectionist temporal classification head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- data2vec-audio — Data2VecAudioForCTC (Data2VecAudio model)
- hubert —
HubertForCTC(Hubert model) - mctct —
MCTCTForCTC(M-CTC-T model) - parakeet_ctc —
ParakeetForCTC(Parakeet model) - sew —
SEWForCTC(SEW model) - sew-d —
SEWDForCTC(SEW-D model) - unispeech —
UniSpeechForCTC(UniSpeech model) - unispeech-sat —
UniSpeechSatForCTC(UniSpeechSat model) - wav2vec2 —
Wav2Vec2ForCTC(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2BertForCTC(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2ConformerForCTC(Wav2Vec2-Conformer model) - wavlm —
WavLMForCTC(WavLM model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForCTC
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForCTC.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForCTC.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForCTC.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForSpeechSeq2Seq
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
DiaConfigconfiguration class:DiaForConditionalGeneration(Dia model)GraniteSpeechConfigconfiguration class:GraniteSpeechForConditionalGeneration(GraniteSpeech model)KyutaiSpeechToTextConfigconfiguration class:KyutaiSpeechToTextForConditionalGeneration(KyutaiSpeechToText model)MoonshineConfigconfiguration class:MoonshineForConditionalGeneration(Moonshine model)Pop2PianoConfigconfiguration class:Pop2PianoForConditionalGeneration(Pop2Piano model)SeamlessM4TConfigconfiguration class:SeamlessM4TForSpeechToText(SeamlessM4T model)SeamlessM4Tv2Configconfiguration class:SeamlessM4Tv2ForSpeechToText(SeamlessM4Tv2 model)Speech2TextConfigconfiguration class:Speech2TextForConditionalGeneration(Speech2Text model)SpeechEncoderDecoderConfigconfiguration class:SpeechEncoderDecoderModel(Speech Encoder decoder model)SpeechT5Configconfiguration class:SpeechT5ForSpeechToText(SpeechT5 model)WhisperConfigconfiguration class:WhisperForConditionalGeneration(Whisper model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- dia —
DiaForConditionalGeneration(Dia model) - granite_speech —
GraniteSpeechForConditionalGeneration(GraniteSpeech model) - kyutai_speech_to_text —
KyutaiSpeechToTextForConditionalGeneration(KyutaiSpeechToText model) - moonshine —
MoonshineForConditionalGeneration(Moonshine model) - pop2piano —
Pop2PianoForConditionalGeneration(Pop2Piano model) - seamless_m4t —
SeamlessM4TForSpeechToText(SeamlessM4T model) - seamless_m4t_v2 —
SeamlessM4Tv2ForSpeechToText(SeamlessM4Tv2 model) - speech-encoder-decoder —
SpeechEncoderDecoderModel(Speech Encoder decoder model) - speech_to_text —
Speech2TextForConditionalGeneration(Speech2Text model) - speecht5 —
SpeechT5ForSpeechToText(SpeechT5 model) - whisper —
WhisperForConditionalGeneration(Whisper model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForSpeechSeq2Seq
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForSpeechSeq2Seq.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForSpeechSeq2Seq
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
Speech2TextConfigconfiguration class:TFSpeech2TextForConditionalGeneration(Speech2Text model)WhisperConfigconfiguration class:TFWhisperForConditionalGeneration(Whisper model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- speech_to_text —
TFSpeech2TextForConditionalGeneration(Speech2Text model) - whisper —
TFWhisperForConditionalGeneration(Whisper model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForSpeechSeq2Seq
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForSpeechSeq2Seq.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForSpeechSeq2Seq
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
SpeechEncoderDecoderConfigconfiguration class:FlaxSpeechEncoderDecoderModel(Speech Encoder decoder model)WhisperConfigconfiguration class:FlaxWhisperForConditionalGeneration(Whisper model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a sequence-to-sequence speech-to-text modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- speech-encoder-decoder —
FlaxSpeechEncoderDecoderModel(Speech Encoder decoder model) - whisper —
FlaxWhisperForConditionalGeneration(Whisper model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForSpeechSeq2Seq
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForSpeechSeq2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForSpeechSeq2Seq.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForAudioXVector
This is a generic model class that will be instantiated as one of the model classes of the library (with a audio retrieval via x-vector head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- Data2VecAudioConfig configuration class: Data2VecAudioForXVector (Data2VecAudio model)
UniSpeechSatConfigconfiguration class:UniSpeechSatForXVector(UniSpeechSat model)Wav2Vec2BertConfigconfiguration class:Wav2Vec2BertForXVector(Wav2Vec2-BERT model)Wav2Vec2Configconfiguration class:Wav2Vec2ForXVector(Wav2Vec2 model)Wav2Vec2ConformerConfigconfiguration class:Wav2Vec2ConformerForXVector(Wav2Vec2-Conformer model)WavLMConfigconfiguration class:WavLMForXVector(WavLM model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a audio retrieval via x-vector head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a audio retrieval via x-vector head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- data2vec-audio — Data2VecAudioForXVector (Data2VecAudio model)
- unispeech-sat —
UniSpeechSatForXVector(UniSpeechSat model) - wav2vec2 —
Wav2Vec2ForXVector(Wav2Vec2 model) - wav2vec2-bert —
Wav2Vec2BertForXVector(Wav2Vec2-BERT model) - wav2vec2-conformer —
Wav2Vec2ConformerForXVector(Wav2Vec2-Conformer model) - wavlm —
WavLMForXVector(WavLM model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForAudioXVector
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForAudioXVector.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForAudioXVector.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForAudioXVector.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForTextToSpectrogram
AutoModelForTextToWaveform
Multimodal
以下の自動クラスは、次のマルチモーダルタスクに利用可能です。
AutoModelForTableQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a table question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
TapasConfigconfiguration class:TapasForQuestionAnswering(TAPAS model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a table question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a table question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- tapas —
TapasForQuestionAnswering(TAPAS model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForTableQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")
>>> # Update configuration during loading
>>> model = AutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/tapas_tf_model_config.json")
>>> model = AutoModelForTableQuestionAnswering.from_pretrained(
... "./tf_model/tapas_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForTableQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a table question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
TapasConfigconfiguration class:TFTapasForQuestionAnswering(TAPAS model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a table question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a table question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- tapas —
TFTapasForQuestionAnswering(TAPAS model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForTableQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")
>>> # Update configuration during loading
>>> model = TFAutoModelForTableQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/tapas_pt_model_config.json")
>>> model = TFAutoModelForTableQuestionAnswering.from_pretrained(
... "./pt_model/tapas_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForDocumentQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a document question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
LayoutLMConfigconfiguration class:LayoutLMForQuestionAnswering(LayoutLM model)LayoutLMv2Configconfiguration class:LayoutLMv2ForQuestionAnswering(LayoutLMv2 model)LayoutLMv3Configconfiguration class:LayoutLMv3ForQuestionAnswering(LayoutLMv3 model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a document question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
Examples:
>>> from transformers import AutoConfig, AutoModelForDocumentQuestionAnswering
>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")
>>> model = AutoModelForDocumentQuestionAnswering.from_config(config)from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a document question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- layoutlm —
LayoutLMForQuestionAnswering(LayoutLM model) - layoutlmv2 —
LayoutLMv2ForQuestionAnswering(LayoutLMv2 model) - layoutlmv3 —
LayoutLMv3ForQuestionAnswering(LayoutLMv3 model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForDocumentQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")
>>> # Update configuration during loading
>>> model = AutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/layoutlm_tf_model_config.json")
>>> model = AutoModelForDocumentQuestionAnswering.from_pretrained(
... "./tf_model/layoutlm_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )TFAutoModelForDocumentQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a document question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
LayoutLMConfigconfiguration class:TFLayoutLMForQuestionAnswering(LayoutLM model)LayoutLMv3Configconfiguration class:TFLayoutLMv3ForQuestionAnswering(LayoutLMv3 model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a document question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
Examples:
>>> from transformers import AutoConfig, TFAutoModelForDocumentQuestionAnswering
>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")
>>> model = TFAutoModelForDocumentQuestionAnswering.from_config(config)from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a document question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- layoutlm —
TFLayoutLMForQuestionAnswering(LayoutLM model) - layoutlmv3 —
TFLayoutLMv3ForQuestionAnswering(LayoutLMv3 model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForDocumentQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3")
>>> # Update configuration during loading
>>> model = TFAutoModelForDocumentQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", revision="52e01b3", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/layoutlm_pt_model_config.json")
>>> model = TFAutoModelForDocumentQuestionAnswering.from_pretrained(
... "./pt_model/layoutlm_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForVisualQuestionAnswering
This is a generic model class that will be instantiated as one of the model classes of the library (with a visual question answering head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- Blip2Config configuration class: Blip2ForConditionalGeneration (BLIP-2 model)
- BlipConfig configuration class: BlipForQuestionAnswering (BLIP model)
ViltConfigconfiguration class:ViltForQuestionAnswering(ViLT model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a visual question answering head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a visual question answering head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- blip — BlipForQuestionAnswering (BLIP model)
- blip-2 — Blip2ForConditionalGeneration (BLIP-2 model)
- vilt —
ViltForQuestionAnswering(ViLT model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForVisualQuestionAnswering
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForVisualQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
>>> # Update configuration during loading
>>> model = AutoModelForVisualQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/vilt_tf_model_config.json")
>>> model = AutoModelForVisualQuestionAnswering.from_pretrained(
... "./tf_model/vilt_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )AutoModelForVision2Seq
TFAutoModelForVision2Seq
This is a generic model class that will be instantiated as one of the model classes of the library (with a vision-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BlipConfig configuration class: TFBlipForConditionalGeneration (BLIP model)
VisionEncoderDecoderConfigconfiguration class:TFVisionEncoderDecoderModel(Vision Encoder decoder model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a vision-to-text modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a vision-to-text modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- blip — TFBlipForConditionalGeneration (BLIP model)
- vision-encoder-decoder —
TFVisionEncoderDecoderModel(Vision Encoder decoder model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForVision2Seq
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForVision2Seq.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )FlaxAutoModelForVision2Seq
This is a generic model class that will be instantiated as one of the model classes of the library (with a vision-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
VisionEncoderDecoderConfigconfiguration class:FlaxVisionEncoderDecoderModel(Vision Encoder decoder model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a vision-to-text modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin). In this case,from_ptshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool, optional, defaults toFalse) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a vision-to-text modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- vision-encoder-decoder —
FlaxVisionEncoderDecoderModel(Vision Encoder decoder model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForVision2Seq
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForVision2Seq.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForVision2Seq.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )AutoModelForImageTextToText
This is a generic model class that will be instantiated as one of the model classes of the library (with a image-text-to-text modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
AriaConfigconfiguration class:AriaForConditionalGeneration(Aria model)AyaVisionConfigconfiguration class:AyaVisionForConditionalGeneration(AyaVision model)- Blip2Config configuration class: Blip2ForConditionalGeneration (BLIP-2 model)
- BlipConfig configuration class: BlipForConditionalGeneration (BLIP model)
ChameleonConfigconfiguration class:ChameleonForConditionalGeneration(Chameleon model)Cohere2VisionConfigconfiguration class:Cohere2VisionForConditionalGeneration(Cohere2Vision model)DeepseekVLConfigconfiguration class:DeepseekVLForConditionalGeneration(DeepseekVL model)DeepseekVLHybridConfigconfiguration class:DeepseekVLHybridForConditionalGeneration(DeepseekVLHybrid model)Emu3Configconfiguration class:Emu3ForConditionalGeneration(Emu3 model)EvollaConfigconfiguration class:EvollaForProteinText2Text(Evolla model)Florence2Configconfiguration class:Florence2ForConditionalGeneration(Florence2 model)FuyuConfigconfiguration class:FuyuForCausalLM(Fuyu model)Gemma3Configconfiguration class:Gemma3ForConditionalGeneration(Gemma3ForConditionalGeneration model)Gemma3nConfigconfiguration class:Gemma3nForConditionalGeneration(Gemma3nForConditionalGeneration model)GitConfigconfiguration class:GitForCausalLM(GIT model)Glm4vConfigconfiguration class:Glm4vForConditionalGeneration(GLM4V model)Glm4vMoeConfigconfiguration class:Glm4vMoeForConditionalGeneration(GLM4VMOE model)GotOcr2Configconfiguration class:GotOcr2ForConditionalGeneration(GOT-OCR2 model)Idefics2Configconfiguration class:Idefics2ForConditionalGeneration(Idefics2 model)Idefics3Configconfiguration class:Idefics3ForConditionalGeneration(Idefics3 model)IdeficsConfigconfiguration class:IdeficsForVisionText2Text(IDEFICS model)InstructBlipConfigconfiguration class:InstructBlipForConditionalGeneration(InstructBLIP model)InternVLConfigconfiguration class:InternVLForConditionalGeneration(InternVL model)JanusConfigconfiguration class:JanusForConditionalGeneration(Janus model)Kosmos2Configconfiguration class:Kosmos2ForConditionalGeneration(KOSMOS-2 model)Kosmos2_5Configconfiguration class:Kosmos2_5ForConditionalGeneration(KOSMOS-2.5 model)Lfm2VlConfigconfiguration class:Lfm2VlForConditionalGeneration(Lfm2Vl model)Llama4Configconfiguration class:Llama4ForConditionalGeneration(Llama4 model)LlavaConfigconfiguration class:LlavaForConditionalGeneration(LLaVa model)LlavaNextConfigconfiguration class:LlavaNextForConditionalGeneration(LLaVA-NeXT model)LlavaNextVideoConfigconfiguration class:LlavaNextVideoForConditionalGeneration(LLaVa-NeXT-Video model)LlavaOnevisionConfigconfiguration class:LlavaOnevisionForConditionalGeneration(LLaVA-Onevision model)Mistral3Configconfiguration class:Mistral3ForConditionalGeneration(Mistral3 model)MllamaConfigconfiguration class:MllamaForConditionalGeneration(Mllama model)Ovis2Configconfiguration class:Ovis2ForConditionalGeneration(Ovis2 model)PaliGemmaConfigconfiguration class:PaliGemmaForConditionalGeneration(PaliGemma model)PerceptionLMConfigconfiguration class:PerceptionLMForConditionalGeneration(PerceptionLM model)Pix2StructConfigconfiguration class:Pix2StructForConditionalGeneration(Pix2Struct model)PixtralVisionConfigconfiguration class:LlavaForConditionalGeneration(Pixtral model)Qwen2VLConfigconfiguration class:Qwen2VLForConditionalGeneration(Qwen2VL model)Qwen2_5_VLConfigconfiguration class:Qwen2_5_VLForConditionalGeneration(Qwen2_5_VL model)Qwen3VLConfigconfiguration class:Qwen3VLForConditionalGeneration(Qwen3VL model)Qwen3VLMoeConfigconfiguration class:Qwen3VLMoeForConditionalGeneration(Qwen3VLMoe model)ShieldGemma2Configconfiguration class:Gemma3ForConditionalGeneration(Shieldgemma2 model)SmolVLMConfigconfiguration class:SmolVLMForConditionalGeneration(SmolVLM model)UdopConfigconfiguration class:UdopForConditionalGeneration(UDOP model)VipLlavaConfigconfiguration class:VipLlavaForConditionalGeneration(VipLlava model)VisionEncoderDecoderConfigconfiguration class:VisionEncoderDecoderModel(Vision Encoder decoder model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a image-text-to-text modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a image-text-to-text modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- aria —
AriaForConditionalGeneration(Aria model) - aya_vision —
AyaVisionForConditionalGeneration(AyaVision model) - blip — BlipForConditionalGeneration (BLIP model)
- blip-2 — Blip2ForConditionalGeneration (BLIP-2 model)
- chameleon —
ChameleonForConditionalGeneration(Chameleon model) - cohere2_vision —
Cohere2VisionForConditionalGeneration(Cohere2Vision model) - deepseek_vl —
DeepseekVLForConditionalGeneration(DeepseekVL model) - deepseek_vl_hybrid —
DeepseekVLHybridForConditionalGeneration(DeepseekVLHybrid model) - emu3 —
Emu3ForConditionalGeneration(Emu3 model) - evolla —
EvollaForProteinText2Text(Evolla model) - florence2 —
Florence2ForConditionalGeneration(Florence2 model) - fuyu —
FuyuForCausalLM(Fuyu model) - gemma3 —
Gemma3ForConditionalGeneration(Gemma3ForConditionalGeneration model) - gemma3n —
Gemma3nForConditionalGeneration(Gemma3nForConditionalGeneration model) - git —
GitForCausalLM(GIT model) - glm4v —
Glm4vForConditionalGeneration(GLM4V model) - glm4v_moe —
Glm4vMoeForConditionalGeneration(GLM4VMOE model) - got_ocr2 —
GotOcr2ForConditionalGeneration(GOT-OCR2 model) - idefics —
IdeficsForVisionText2Text(IDEFICS model) - idefics2 —
Idefics2ForConditionalGeneration(Idefics2 model) - idefics3 —
Idefics3ForConditionalGeneration(Idefics3 model) - instructblip —
InstructBlipForConditionalGeneration(InstructBLIP model) - internvl —
InternVLForConditionalGeneration(InternVL model) - janus —
JanusForConditionalGeneration(Janus model) - kosmos-2 —
Kosmos2ForConditionalGeneration(KOSMOS-2 model) - kosmos-2.5 —
Kosmos2_5ForConditionalGeneration(KOSMOS-2.5 model) - lfm2_vl —
Lfm2VlForConditionalGeneration(Lfm2Vl model) - llama4 —
Llama4ForConditionalGeneration(Llama4 model) - llava —
LlavaForConditionalGeneration(LLaVa model) - llava_next —
LlavaNextForConditionalGeneration(LLaVA-NeXT model) - llava_next_video —
LlavaNextVideoForConditionalGeneration(LLaVa-NeXT-Video model) - llava_onevision —
LlavaOnevisionForConditionalGeneration(LLaVA-Onevision model) - mistral3 —
Mistral3ForConditionalGeneration(Mistral3 model) - mllama —
MllamaForConditionalGeneration(Mllama model) - ovis2 —
Ovis2ForConditionalGeneration(Ovis2 model) - paligemma —
PaliGemmaForConditionalGeneration(PaliGemma model) - perception_lm —
PerceptionLMForConditionalGeneration(PerceptionLM model) - pix2struct —
Pix2StructForConditionalGeneration(Pix2Struct model) - pixtral —
LlavaForConditionalGeneration(Pixtral model) - qwen2_5_vl —
Qwen2_5_VLForConditionalGeneration(Qwen2_5_VL model) - qwen2_vl —
Qwen2VLForConditionalGeneration(Qwen2VL model) - qwen3_vl —
Qwen3VLForConditionalGeneration(Qwen3VL model) - qwen3_vl_moe —
Qwen3VLMoeForConditionalGeneration(Qwen3VLMoe model) - shieldgemma2 —
Gemma3ForConditionalGeneration(Shieldgemma2 model) - smolvlm —
SmolVLMForConditionalGeneration(SmolVLM model) - udop —
UdopForConditionalGeneration(UDOP model) - vipllava —
VipLlavaForConditionalGeneration(VipLlava model) - vision-encoder-decoder —
VisionEncoderDecoderModel(Vision Encoder decoder model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForImageTextToText
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForImageTextToText.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForImageTextToText.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForImageTextToText.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )Time Series
AutoModelForTimeSeriesPrediction
This is a generic model class that will be instantiated as one of the model classes of the library (with a time-series prediction head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__() (throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
TimesFmConfigconfiguration class:TimesFmModelForPrediction(TimesFm model)
- attn_implementation (
str, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"(manual implementation of the attention),"sdpa"(usingF.scaled_dot_product_attention), or"flash_attention_2"(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"implementation.
Instantiates one of the model classes of the library (with a time-series prediction head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
stroros.PathLike) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index). In this case,from_tfshould be set toTrueand a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_pathand a configuration JSON file named config.json is found in the directory.
- state_dict (dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool, optional, defaults toFalse) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_pathargument). - force_download (
bool, optional, defaults toFalse) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request. - output_loading_info(
bool, optional, defaults toFalse) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool, optional, defaults toFalse) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - trust_remote_code (
bool, optional, defaults toFalse) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTruefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str, optional, defaults to"main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True). Behaves differently depending on whether aconfigis provided or automatically loaded:- If a configuration is provided with
config,**kwargswill be directly passed to the underlying model’s__init__method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargswill be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargsthat corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargsvalue. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a time-series prediction head) from a pretrained model.
The model class to instantiate is selected based on the model_type property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path:
- timesfm —
TimesFmModelForPrediction(TimesFm model)
The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForTimeSeriesPrediction
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForTimeSeriesPrediction.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForTimeSeriesPrediction.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForTimeSeriesPrediction.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )