MetaCLIP-2-Cifar10 / README.md

Update README.md

bde1b12 verified about 1 month ago

5.29 kB

	---
	license: cc-by-nc-4.0
	datasets:
	- uoft-cs/cifar10
	language:
	- en
	base_model:
	- facebook/metaclip-2-worldwide-s16
	pipeline_tag: image-classification
	library_name: transformers
	tags:
	- text-generation-inference
	- cifar10
	---

	![1](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FmZz2vZy1IENHbtmXm1lUe.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	# MetaCLIP-2-Cifar10

	> MetaCLIP-2-Cifar10 is an image classification vision–language encoder model fine-tuned from facebook/metaclip-2-worldwide-s16 for a single-label classification task.
	> It is designed to identify and categorize images into the ten CIFAR-10 object classes using the MetaClip2ForImageClassification architecture.

	>[!note]
	MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062

	```
	Classification report:

	precision recall f1-score support

	airplane 0.9813 0.9685 0.9748 2000
	automobile 0.9777 0.9850 0.9813 2000
	bird 0.9560 0.9560 0.9560 2000
	cat 0.9104 0.9395 0.9247 2000
	deer 0.9566 0.9580 0.9573 2000
	dog 0.9476 0.9215 0.9343 2000
	frog 0.9774 0.9735 0.9755 2000
	horse 0.9704 0.9670 0.9687 2000
	ship 0.9782 0.9890 0.9836 2000
	truck 0.9774 0.9735 0.9755 2000

	accuracy 0.9631 20000
	macro avg 0.9633 0.9632 0.9632 20000
	weighted avg 0.9633 0.9631 0.9632 20000
	```

	![download](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2Fdr7B2yAcfNEJ6ScY6XNC5.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	---

	The model classifies images into the following categories:

	* Class 0: airplane
	* Class 1: automobile
	* Class 2: bird
	* Class 3: cat
	* Class 4: deer
	* Class 5: dog
	* Class 6: frog
	* Class 7: horse
	* Class 8: ship
	* Class 9: truck

	# Run with Transformers

	```python
	!pip install -q transformers torch pillow gradio
	```

	```python
	import gradio as gr
	from transformers import AutoImageProcessor
	from transformers import AutoModelForImageClassification
	from transformers.image_utils import load_image
	from PIL import Image
	import torch

	# Load model and processor
	model_name = "prithivMLmods/MetaCLIP-2-Cifar10"
	model = AutoModelForImageClassification.from_pretrained(model_name)
	processor = AutoImageProcessor.from_pretrained(model_name)

	def cifar10_classification(image):
	"""Predicts the CIFAR-10 class represented in an image."""
	image = Image.fromarray(image).convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

	labels = {
	"0": "airplane",
	"1": "automobile",
	"2": "bird",
	"3": "cat",
	"4": "deer",
	"5": "dog",
	"6": "frog",
	"7": "horse",
	"8": "ship",
	"9": "truck"
	}
	predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}

	return predictions

	# Create Gradio interface
	iface = gr.Interface(
	fn=cifar10_classification,
	inputs=gr.Image(type="numpy"),
	outputs=gr.Label(label="Prediction Scores"),
	title="CIFAR-10 Classification",
	description="Upload an image to classify it into one of the CIFAR-10 categories."
	)

	# Launch the app
	if __name__ == "__main__":
	iface.launch()
	```

	# Sample Inference:

	![Screenshot 2025-11-15 at 08-21-23 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FvPnT4-Imqykvjll9t5aYC.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	![Screenshot 2025-11-15 at 08-26-25 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2F1vRKZKk8mWIhw4IV_DZYV.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	![Screenshot 2025-11-15 at 08-22-10 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2F72idt8H-cjX2pLOOTgNxZ.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	![Screenshot 2025-11-15 at 08-22-41 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FVEE08FlRAaSzCaOyq6135.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	![Screenshot 2025-11-15 at 08-23-53 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FSFjNL9AIkL0myJ2HSrjfk.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	![Screenshot 2025-11-15 at 08-24-30 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2F6M8Z5PlbD1QSJ5Sbdo1u-.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	![Screenshot 2025-11-15 at 08-25-04 CIFAR-10 Classification](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65bb837dbfb878f46c77de4c%2FjNv67l2-M3c_TYmwGg25f.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	# Intended Use:

	The MetaCLIP-2-Cifar10 model is designed for object classification across the ten CIFAR-10 categories.
	Potential use cases include:

	* Educational & Research Applications: Benchmarking experiments, model comparison, and deep learning studies.
	* Lightweight Vision Systems: Useful for systems requiring simple object recognition.
	* Dataset Exploration: Assisting in data inspection, annotation, and visualization.
	* Prototype Systems: Ideal for rapid prototyping in classification pipelines.