ConvNeXt-XXL Fine-Tuned on DTD
This repository provides a ConvNeXt-XXL convolutional neural network fine-tuned on the Describable Textures Dataset (DTD) for image classification.
Citation
If you use this model in your work, please cite it as:
@misc{vishnu_chityala_2025,
author = {Vishnu Chityala},
title = {dtd-convnext-xxl (Revision b9ec966)},
year = 2025,
url = {https://huggingface.co/vishnurchityala/dtd-convnext-xxl},
doi = {10.57967/hf/6417},
publisher = {Hugging Face}
}
Training Results
Accuracy Curves
Final Evaluation Metrics
- Accuracy: 80.6%
- Loss: ~0.687
Files
convnext_dtd_acc_80.59.pthโ fine-tuned model checkpoint (PyTorch)training_plot_acc_80.59.pngโ accuracy curves for ConvNeXt fine-tuningREADME.mdโ model card and documentation
Usage
Load the model checkpoint with PyTorch and run inference:
import torch
from torch import nn
from PIL import Image
from torchvision import transforms
# Define your model class
class ConvNeXtClassifier(nn.Module):
def __init__(self, num_classes=47, pretrained=True, mlp_hidden=None,
dropout=0.35, freeze_backbone=True):
super().__init__()
import timm
if mlp_hidden is None:
mlp_hidden = [1024, 512, 256]
self.backbone = timm.create_model("convnext_xxlarge", pretrained=pretrained)
in_features = self.backbone.head.fc.in_features
self.backbone.head.fc = nn.Identity()
layers = []
input_dim = in_features
for h in mlp_hidden:
layers += [nn.Linear(input_dim, h), nn.ReLU(inplace=True), nn.Dropout(dropout)]
input_dim = h
layers.append(nn.Linear(input_dim, num_classes))
self.classifier = nn.Sequential(*layers)
if freeze_backbone:
for param in self.backbone.stem.parameters():
param.requires_grad = False
for param in self.backbone.stages.parameters():
param.requires_grad = False
def forward(self, x):
x = self.backbone(x)
x = self.classifier(x)
return x
# Load model
model = ConvNeXtClassifier(pretrained=False)
model.load_state_dict(torch.load("convnext_dtd_acc_80.59.pth", map_location="cpu"))
model.eval()
# Example inference
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
image = Image.open("example_image.jpg").convert("RGB")
input_tensor = transform(image).unsqueeze(0) # Add batch dimension
with torch.no_grad():
output = model(input_tensor)
predicted_class = output.argmax(dim=1).item()
print(f"Predicted class: {predicted_class}")
