laion
/

CLIP-ViT-B-32-256x256-DataComp-s34B-b86K

Zero-Shot Image Classification

OpenCLIP

Safetensors

Model card Files Files and versions

xet

Community

mehdidc commited on Jun 11

Commit

4afec35

verified ·

1 Parent(s): 7842cb0

Update README.md

Browse files

Files changed (1) hide show

README.md +34 -30

README.md CHANGED Viewed

@@ -18,9 +18,9 @@ pipeline_tag: zero-shot-image-classification
 2. [Uses](#uses)
 3. [Training Details](#training-details)
 4. [Evaluation](#evaluation)
-5. [Acknowledgements](#acknowledgements)
-6. [Citation](#citation)
-7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 # Model Details
@@ -118,6 +118,37 @@ The testing is performed on a suite of 38 datasets. See our paper for more detai
 The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k,  64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
 # Citation
@@ -190,30 +221,3 @@ CLIP benchmark software
                   },
 }
-# How to Get Started with the Model
-Zero-shot classification example:
-```python
-import torch
-from PIL import Image
-import open_clip
-model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
-model.eval()  # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
-tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
-image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
-text = tokenizer(["a diagram", "a dog", "a cat"])
-with torch.no_grad(), torch.autocast("cuda"):
-    image_features = model.encode_image(image)
-    text_features = model.encode_text(text)
-    image_features /= image_features.norm(dim=-1, keepdim=True)
-    text_features /= text_features.norm(dim=-1, keepdim=True)
-    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
-print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]
-```

 2. [Uses](#uses)
 3. [Training Details](#training-details)
 4. [Evaluation](#evaluation)
+5. [How To Get Started With the Model](#how-to-get-started-with-the-model)
+6. [Acknowledgements](#acknowledgements)
+7. [Citation](#citation)
 # Model Details
 The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k,  64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
+# How to Get Started with the Model
+Zero-shot classification example:
+```python
+import torch
+from PIL import Image
+import open_clip
+model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
+model.eval()  # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
+tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
+image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
+text = tokenizer(["a diagram", "a dog", "a cat"])
+with torch.no_grad(), torch.autocast("cuda"):
+    image_features = model.encode_image(image)
+    text_features = model.encode_text(text)
+    image_features /= image_features.norm(dim=-1, keepdim=True)
+    text_features /= text_features.norm(dim=-1, keepdim=True)
+    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
+print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]
+```
+# Acknowledgements
+We gratefully acknowledge the computing time granted by the John von Neumann Institute for Computing (NIC)
+and provided on the supercomputer JURECA at Jülich Supercomputing Centre (JSC).
 # Citation
                   },
 }