File size: 5,290 Bytes
5f4bcc6 7a8de5e 5f4bcc6 7a8de5e bde1b12 5f4bcc6 7a8de5e 733ae76 0939b60 733ae76 7a8de5e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
license: cc-by-nc-4.0
datasets:
- uoft-cs/cifar10
language:
- en
base_model:
- facebook/metaclip-2-worldwide-s16
pipeline_tag: image-classification
library_name: transformers
tags:
- text-generation-inference
- cifar10
---

# **MetaCLIP-2-Cifar10**
> **MetaCLIP-2-Cifar10** is an image classification vision–language encoder model fine-tuned from **facebook/metaclip-2-worldwide-s16** for a single-label classification task.
> It is designed to identify and categorize images into the ten CIFAR-10 object classes using the **MetaClip2ForImageClassification** architecture.
>[!note]
MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062
```
Classification report:
precision recall f1-score support
airplane 0.9813 0.9685 0.9748 2000
automobile 0.9777 0.9850 0.9813 2000
bird 0.9560 0.9560 0.9560 2000
cat 0.9104 0.9395 0.9247 2000
deer 0.9566 0.9580 0.9573 2000
dog 0.9476 0.9215 0.9343 2000
frog 0.9774 0.9735 0.9755 2000
horse 0.9704 0.9670 0.9687 2000
ship 0.9782 0.9890 0.9836 2000
truck 0.9774 0.9735 0.9755 2000
accuracy 0.9631 20000
macro avg 0.9633 0.9632 0.9632 20000
weighted avg 0.9633 0.9631 0.9632 20000
```

---
The model classifies images into the following categories:
* **Class 0:** airplane
* **Class 1:** automobile
* **Class 2:** bird
* **Class 3:** cat
* **Class 4:** deer
* **Class 5:** dog
* **Class 6:** frog
* **Class 7:** horse
* **Class 8:** ship
* **Class 9:** truck
# **Run with Transformers**
```python
!pip install -q transformers torch pillow gradio
```
```python
import gradio as gr
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/MetaCLIP-2-Cifar10"
model = AutoModelForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def cifar10_classification(image):
"""Predicts the CIFAR-10 class represented in an image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
labels = {
"0": "airplane",
"1": "automobile",
"2": "bird",
"3": "cat",
"4": "deer",
"5": "dog",
"6": "frog",
"7": "horse",
"8": "ship",
"9": "truck"
}
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=cifar10_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="CIFAR-10 Classification",
description="Upload an image to classify it into one of the CIFAR-10 categories."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
```
# **Sample Inference:**







# **Intended Use:**
The **MetaCLIP-2-Cifar10** model is designed for object classification across the ten CIFAR-10 categories.
Potential use cases include:
* **Educational & Research Applications:** Benchmarking experiments, model comparison, and deep learning studies.
* **Lightweight Vision Systems:** Useful for systems requiring simple object recognition.
* **Dataset Exploration:** Assisting in data inspection, annotation, and visualization.
* **Prototype Systems:** Ideal for rapid prototyping in classification pipelines. |