Update model config and README
Browse files- README.md +7 -6
- config.json +1 -1
README.md
CHANGED
|
@@ -1,5 +1,6 @@
|
|
| 1 |
---
|
| 2 |
tags:
|
|
|
|
| 3 |
- timm
|
| 4 |
- transformers
|
| 5 |
pipeline_tag: image-feature-extraction
|
|
@@ -10,7 +11,7 @@ license_link: https://ai.meta.com/resources/models-and-libraries/dinov3-license
|
|
| 10 |
datasets:
|
| 11 |
- lvd-1689m
|
| 12 |
---
|
| 13 |
-
# Model card for vit_small_patch16_dinov3_qkvb.
|
| 14 |
|
| 15 |
A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3 ViT-7B model.
|
| 16 |
|
|
@@ -19,7 +20,7 @@ A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3
|
|
| 19 |
* The original models keep RoPE periods as a persistent `bfloat16` buffer. `timm` generates `float32` periods at init. This results in some numerical differences, however the `timm` approach should be less problematic running on devices without bfloat16 support, and appears to work as well if not slightly better for fine-tuning. `model.rope.periods = model.rope.periods.to(torch.bfloat16).to(torch.float32)` will truncate the periods to bfloat16 and result in matching outputs.
|
| 20 |
|
| 21 |
## Model Details
|
| 22 |
-
- **Model Type:** Image
|
| 23 |
- **Model Stats:**
|
| 24 |
- Params (M): 21.6
|
| 25 |
- GMACs: 6.3
|
|
@@ -44,7 +45,7 @@ img = Image.open(urlopen(
|
|
| 44 |
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
|
| 45 |
))
|
| 46 |
|
| 47 |
-
model = timm.create_model('vit_small_patch16_dinov3_qkvb.
|
| 48 |
model = model.eval()
|
| 49 |
|
| 50 |
# get model specific transforms (normalization, resize)
|
|
@@ -67,7 +68,7 @@ img = Image.open(urlopen(
|
|
| 67 |
))
|
| 68 |
|
| 69 |
model = timm.create_model(
|
| 70 |
-
'vit_small_patch16_dinov3_qkvb.
|
| 71 |
pretrained=True,
|
| 72 |
features_only=True,
|
| 73 |
)
|
|
@@ -100,7 +101,7 @@ img = Image.open(urlopen(
|
|
| 100 |
))
|
| 101 |
|
| 102 |
model = timm.create_model(
|
| 103 |
-
'vit_small_patch16_dinov3_qkvb.
|
| 104 |
pretrained=True,
|
| 105 |
num_classes=0, # remove classifier nn.Linear
|
| 106 |
)
|
|
@@ -190,4 +191,4 @@ See the associated paper for details on the evaluation protocols
|
|
| 190 |
doi = {10.5281/zenodo.4414861},
|
| 191 |
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
|
| 192 |
}
|
| 193 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
+
- image-feature-extraction
|
| 4 |
- timm
|
| 5 |
- transformers
|
| 6 |
pipeline_tag: image-feature-extraction
|
|
|
|
| 11 |
datasets:
|
| 12 |
- lvd-1689m
|
| 13 |
---
|
| 14 |
+
# Model card for vit_small_patch16_dinov3_qkvb.lvd_1689m
|
| 15 |
|
| 16 |
A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3 ViT-7B model.
|
| 17 |
|
|
|
|
| 20 |
* The original models keep RoPE periods as a persistent `bfloat16` buffer. `timm` generates `float32` periods at init. This results in some numerical differences, however the `timm` approach should be less problematic running on devices without bfloat16 support, and appears to work as well if not slightly better for fine-tuning. `model.rope.periods = model.rope.periods.to(torch.bfloat16).to(torch.float32)` will truncate the periods to bfloat16 and result in matching outputs.
|
| 21 |
|
| 22 |
## Model Details
|
| 23 |
+
- **Model Type:** Image Feature Encoder
|
| 24 |
- **Model Stats:**
|
| 25 |
- Params (M): 21.6
|
| 26 |
- GMACs: 6.3
|
|
|
|
| 45 |
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
|
| 46 |
))
|
| 47 |
|
| 48 |
+
model = timm.create_model('vit_small_patch16_dinov3_qkvb.lvd_1689m', pretrained=True)
|
| 49 |
model = model.eval()
|
| 50 |
|
| 51 |
# get model specific transforms (normalization, resize)
|
|
|
|
| 68 |
))
|
| 69 |
|
| 70 |
model = timm.create_model(
|
| 71 |
+
'vit_small_patch16_dinov3_qkvb.lvd_1689m',
|
| 72 |
pretrained=True,
|
| 73 |
features_only=True,
|
| 74 |
)
|
|
|
|
| 101 |
))
|
| 102 |
|
| 103 |
model = timm.create_model(
|
| 104 |
+
'vit_small_patch16_dinov3_qkvb.lvd_1689m',
|
| 105 |
pretrained=True,
|
| 106 |
num_classes=0, # remove classifier nn.Linear
|
| 107 |
)
|
|
|
|
| 191 |
doi = {10.5281/zenodo.4414861},
|
| 192 |
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
|
| 193 |
}
|
| 194 |
+
```
|
config.json
CHANGED
|
@@ -4,7 +4,7 @@
|
|
| 4 |
"num_features": 384,
|
| 5 |
"global_pool": "avg",
|
| 6 |
"pretrained_cfg": {
|
| 7 |
-
"tag": "
|
| 8 |
"custom_load": false,
|
| 9 |
"input_size": [
|
| 10 |
3,
|
|
|
|
| 4 |
"num_features": 384,
|
| 5 |
"global_pool": "avg",
|
| 6 |
"pretrained_cfg": {
|
| 7 |
+
"tag": "lvd_1689m",
|
| 8 |
"custom_load": false,
|
| 9 |
"input_size": [
|
| 10 |
3,
|