File size: 5,290 Bytes
5f4bcc6
 
 
 
7a8de5e
 
 
 
 
 
 
 
 
5f4bcc6
 
7a8de5e
 
 
 
 
 
 
bde1b12
 
 
5f4bcc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a8de5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
733ae76
 
 
0939b60
733ae76
 
 
 
 
 
7a8de5e
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: cc-by-nc-4.0
datasets:
- uoft-cs/cifar10
language:
- en
base_model:
- facebook/metaclip-2-worldwide-s16
pipeline_tag: image-classification
library_name: transformers
tags:
- text-generation-inference
- cifar10
---

![1](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FmZz2vZy1IENHbtmXm1lUe.png%3C%2Fspan%3E)

# **MetaCLIP-2-Cifar10**

> **MetaCLIP-2-Cifar10** is an image classification vision–language encoder model fine-tuned from **facebook/metaclip-2-worldwide-s16** for a single-label classification task.
> It is designed to identify and categorize images into the ten CIFAR-10 object classes using the **MetaClip2ForImageClassification** architecture.

>[!note]
MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062

```
Classification report:

              precision    recall  f1-score   support

    airplane     0.9813    0.9685    0.9748      2000
  automobile     0.9777    0.9850    0.9813      2000
        bird     0.9560    0.9560    0.9560      2000
         cat     0.9104    0.9395    0.9247      2000
        deer     0.9566    0.9580    0.9573      2000
         dog     0.9476    0.9215    0.9343      2000
        frog     0.9774    0.9735    0.9755      2000
       horse     0.9704    0.9670    0.9687      2000
        ship     0.9782    0.9890    0.9836      2000
       truck     0.9774    0.9735    0.9755      2000

    accuracy                         0.9631     20000
   macro avg     0.9633    0.9632    0.9632     20000
weighted avg     0.9633    0.9631    0.9632     20000
```

![download](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2Fdr7B2yAcfNEJ6ScY6XNC5.png%3C%2Fspan%3E)

---

The model classifies images into the following categories:

* **Class 0:** airplane
* **Class 1:** automobile
* **Class 2:** bird
* **Class 3:** cat
* **Class 4:** deer
* **Class 5:** dog
* **Class 6:** frog
* **Class 7:** horse
* **Class 8:** ship
* **Class 9:** truck

# **Run with Transformers**

```python
!pip install -q transformers torch pillow gradio
```

```python
import gradio as gr
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/MetaCLIP-2-Cifar10"
model = AutoModelForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

def cifar10_classification(image):
    """Predicts the CIFAR-10 class represented in an image."""
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    labels = {
        "0": "airplane",
        "1": "automobile",
        "2": "bird",
        "3": "cat",
        "4": "deer",
        "5": "dog",
        "6": "frog",
        "7": "horse",
        "8": "ship",
        "9": "truck"
    }
    predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}

    return predictions

# Create Gradio interface
iface = gr.Interface(
    fn=cifar10_classification,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(label="Prediction Scores"),
    title="CIFAR-10 Classification",
    description="Upload an image to classify it into one of the CIFAR-10 categories."
)

# Launch the app
if __name__ == "__main__":
    iface.launch()
```

# **Sample Inference:**

![Screenshot 2025-11-15 at 08-21-23 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FvPnT4-Imqykvjll9t5aYC.png%3C%2Fspan%3E)
![Screenshot 2025-11-15 at 08-26-25 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2F1vRKZKk8mWIhw4IV_DZYV.png%3C%2Fspan%3E)
![Screenshot 2025-11-15 at 08-22-10 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2F72idt8H-cjX2pLOOTgNxZ.png%3C%2Fspan%3E)
![Screenshot 2025-11-15 at 08-22-41 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FVEE08FlRAaSzCaOyq6135.png%3C%2Fspan%3E)
![Screenshot 2025-11-15 at 08-23-53 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FSFjNL9AIkL0myJ2HSrjfk.png%3C%2Fspan%3E)
![Screenshot 2025-11-15 at 08-24-30 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2F6M8Z5PlbD1QSJ5Sbdo1u-.png%3C%2Fspan%3E)
![Screenshot 2025-11-15 at 08-25-04 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FjNv67l2-M3c_TYmwGg25f.png%3C%2Fspan%3E)

# **Intended Use:**

The **MetaCLIP-2-Cifar10** model is designed for object classification across the ten CIFAR-10 categories.
Potential use cases include:

* **Educational & Research Applications:** Benchmarking experiments, model comparison, and deep learning studies.
* **Lightweight Vision Systems:** Useful for systems requiring simple object recognition.
* **Dataset Exploration:** Assisting in data inspection, annotation, and visualization.
* **Prototype Systems:** Ideal for rapid prototyping in classification pipelines.