timm
/

vit_small_patch16_dinov3_qkvb.lvd1689m

@@ -1,5 +1,6 @@
 ---
 tags:
 - timm
 - transformers
 pipeline_tag: image-feature-extraction
@@ -10,7 +11,7 @@ license_link: https://ai.meta.com/resources/models-and-libraries/dinov3-license
 datasets:
 - lvd-1689m
 ---
-# Model card for vit_small_patch16_dinov3_qkvb.lvdm_1689m
 A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3 ViT-7B model.
@@ -19,7 +20,7 @@ A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3
 * The original models keep RoPE periods as a persistent `bfloat16` buffer. `timm` generates `float32` periods at init. This results in some numerical differences, however the `timm` approach should be less problematic running on devices without bfloat16 support, and appears to work as well if not slightly better for fine-tuning. `model.rope.periods = model.rope.periods.to(torch.bfloat16).to(torch.float32)` will truncate the periods to bfloat16 and result in matching outputs.
 ## Model Details
-- **Model Type:** Image feature encoder
 - **Model Stats:**
   - Params (M): 21.6
   - GMACs: 6.3
@@ -44,7 +45,7 @@ img = Image.open(urlopen(
     'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
 ))
-model = timm.create_model('vit_small_patch16_dinov3_qkvb.lvdm_1689m', pretrained=True)
 model = model.eval()
 # get model specific transforms (normalization, resize)
@@ -67,7 +68,7 @@ img = Image.open(urlopen(
 ))
 model = timm.create_model(
-    'vit_small_patch16_dinov3_qkvb.lvdm_1689m',
     pretrained=True,
     features_only=True,
 )
@@ -100,7 +101,7 @@ img = Image.open(urlopen(
 ))
 model = timm.create_model(
-    'vit_small_patch16_dinov3_qkvb.lvdm_1689m',
     pretrained=True,
     num_classes=0,  # remove classifier nn.Linear
 )
@@ -190,4 +191,4 @@ See the associated paper for details on the evaluation protocols
   doi = {10.5281/zenodo.4414861},
   howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
 }
-```

 ---
 tags:
+- image-feature-extraction
 - timm
 - transformers
 pipeline_tag: image-feature-extraction
 datasets:
 - lvd-1689m
 ---
+# Model card for vit_small_patch16_dinov3_qkvb.lvd_1689m
 A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3 ViT-7B model.
 * The original models keep RoPE periods as a persistent `bfloat16` buffer. `timm` generates `float32` periods at init. This results in some numerical differences, however the `timm` approach should be less problematic running on devices without bfloat16 support, and appears to work as well if not slightly better for fine-tuning. `model.rope.periods = model.rope.periods.to(torch.bfloat16).to(torch.float32)` will truncate the periods to bfloat16 and result in matching outputs.
 ## Model Details
+- **Model Type:** Image Feature Encoder
 - **Model Stats:**
   - Params (M): 21.6
   - GMACs: 6.3
     'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
 ))
+model = timm.create_model('vit_small_patch16_dinov3_qkvb.lvd_1689m', pretrained=True)
 model = model.eval()
 # get model specific transforms (normalization, resize)
 ))
 model = timm.create_model(
+    'vit_small_patch16_dinov3_qkvb.lvd_1689m',
     pretrained=True,
     features_only=True,
 )
 ))
 model = timm.create_model(
+    'vit_small_patch16_dinov3_qkvb.lvd_1689m',
     pretrained=True,
     num_classes=0,  # remove classifier nn.Linear
 )
   doi = {10.5281/zenodo.4414861},
   howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
 }
+```

config.json CHANGED Viewed

@@ -4,7 +4,7 @@
   "num_features": 384,
   "global_pool": "avg",
   "pretrained_cfg": {
-    "tag": "lvdm_1689m",
     "custom_load": false,
     "input_size": [
       3,

   "num_features": 384,
   "global_pool": "avg",
   "pretrained_cfg": {
+    "tag": "lvd_1689m",
     "custom_load": false,
     "input_size": [
       3,