Multiclass Image Classification 05142025
					Collection
				
classification net.
					• 
				20 items
				• 
				Updated
					
				•
					
					2
IMAGENETTE is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to classify images into 10 categories from the popular Imagenette dataset using the SiglipForImageClassification architecture.
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786
ImageNet Large Scale Visual Recognition Challenge https://arxiv.org/pdf/1409.0575
Classification Report:
                  precision    recall  f1-score   support
           tench     0.9885    0.9834    0.9859       963
english springer     0.9843    0.9822    0.9832       955
 cassette player     0.9544    0.9486    0.9515       993
       chain saw     0.9257    0.8998    0.9125       858
          church     0.9654    0.9798    0.9726       941
     French horn     0.9757    0.9665    0.9711       956
   garbage truck     0.8883    0.9761    0.9301       961
        gas pump     0.9366    0.9044    0.9202       931
       golf ball     0.9925    0.9716    0.9819       951
       parachute     0.9821    0.9708    0.9764       960
        accuracy                         0.9590      9469
       macro avg     0.9593    0.9583    0.9586      9469
    weighted avg     0.9597    0.9590    0.9591      9469
The model predicts one of the following image classes:
0: tench
1: english springer
2: cassette player
3: chain saw
4: church
5: French horn
6: garbage truck
7: gas pump
8: golf ball
9: parachute
pip install -q transformers torch pillow gradio hf_xet
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/IMAGENETTE"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
    "0": "tench",
    "1": "english springer",
    "2": "cassette player",
    "3": "chain saw",
    "4": "church",
    "5": "French horn",
    "6": "garbage truck",
    "7": "gas pump",
    "8": "golf ball",
    "9": "parachute"
}
def classify_image(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
    
    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }
    return prediction
# Gradio Interface
iface = gr.Interface(
    fn=classify_image,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=3, label="Image Classification"),
    title="IMAGENETTE - SigLIP2 Classifier",
    description="Upload an image to classify it into one of 10 categories from the Imagenette dataset."
)
if __name__ == "__main__":
    iface.launch()
IMAGENETTE is designed for:
Base model
google/siglip2-base-patch16-224