Update README.md
Browse files
README.md
CHANGED
|
@@ -18,9 +18,9 @@ pipeline_tag: zero-shot-image-classification
|
|
| 18 |
2. [Uses](#uses)
|
| 19 |
3. [Training Details](#training-details)
|
| 20 |
4. [Evaluation](#evaluation)
|
| 21 |
-
5. [
|
| 22 |
-
6. [
|
| 23 |
-
7. [
|
| 24 |
|
| 25 |
|
| 26 |
# Model Details
|
|
@@ -118,6 +118,37 @@ The testing is performed on a suite of 38 datasets. See our paper for more detai
|
|
| 118 |
|
| 119 |
The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k, 64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
|
| 120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
# Citation
|
| 123 |
|
|
@@ -190,30 +221,3 @@ CLIP benchmark software
|
|
| 190 |
},
|
| 191 |
}
|
| 192 |
|
| 193 |
-
# How to Get Started with the Model
|
| 194 |
-
|
| 195 |
-
Zero-shot classification example:
|
| 196 |
-
|
| 197 |
-
```python
|
| 198 |
-
import torch
|
| 199 |
-
from PIL import Image
|
| 200 |
-
import open_clip
|
| 201 |
-
|
| 202 |
-
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
| 203 |
-
model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
|
| 204 |
-
tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
| 205 |
-
|
| 206 |
-
image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
|
| 207 |
-
text = tokenizer(["a diagram", "a dog", "a cat"])
|
| 208 |
-
|
| 209 |
-
with torch.no_grad(), torch.autocast("cuda"):
|
| 210 |
-
image_features = model.encode_image(image)
|
| 211 |
-
text_features = model.encode_text(text)
|
| 212 |
-
image_features /= image_features.norm(dim=-1, keepdim=True)
|
| 213 |
-
text_features /= text_features.norm(dim=-1, keepdim=True)
|
| 214 |
-
|
| 215 |
-
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
|
| 216 |
-
|
| 217 |
-
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
|
| 218 |
-
|
| 219 |
-
```
|
|
|
|
| 18 |
2. [Uses](#uses)
|
| 19 |
3. [Training Details](#training-details)
|
| 20 |
4. [Evaluation](#evaluation)
|
| 21 |
+
5. [How To Get Started With the Model](#how-to-get-started-with-the-model)
|
| 22 |
+
6. [Acknowledgements](#acknowledgements)
|
| 23 |
+
7. [Citation](#citation)
|
| 24 |
|
| 25 |
|
| 26 |
# Model Details
|
|
|
|
| 118 |
|
| 119 |
The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k, 64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
|
| 120 |
|
| 121 |
+
# How to Get Started with the Model
|
| 122 |
+
|
| 123 |
+
Zero-shot classification example:
|
| 124 |
+
|
| 125 |
+
```python
|
| 126 |
+
import torch
|
| 127 |
+
from PIL import Image
|
| 128 |
+
import open_clip
|
| 129 |
+
|
| 130 |
+
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
| 131 |
+
model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
|
| 132 |
+
tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
| 133 |
+
|
| 134 |
+
image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
|
| 135 |
+
text = tokenizer(["a diagram", "a dog", "a cat"])
|
| 136 |
+
|
| 137 |
+
with torch.no_grad(), torch.autocast("cuda"):
|
| 138 |
+
image_features = model.encode_image(image)
|
| 139 |
+
text_features = model.encode_text(text)
|
| 140 |
+
image_features /= image_features.norm(dim=-1, keepdim=True)
|
| 141 |
+
text_features /= text_features.norm(dim=-1, keepdim=True)
|
| 142 |
+
|
| 143 |
+
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
|
| 144 |
+
|
| 145 |
+
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
# Acknowledgements
|
| 149 |
+
|
| 150 |
+
We gratefully acknowledge the computing time granted by the John von Neumann Institute for Computing (NIC)
|
| 151 |
+
and provided on the supercomputer JURECA at Jülich Supercomputing Centre (JSC).
|
| 152 |
|
| 153 |
# Citation
|
| 154 |
|
|
|
|
| 221 |
},
|
| 222 |
}
|
| 223 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|