SDXL Detector (ResNet-50)

Model Description

A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024Γ—1024 resolution.

Architecture: ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection)

Training Date: December 30, 2025

Purpose: This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models.

Performance Metrics

Test Set Results (2,856 images)

Metric Score
Accuracy 99.75%
F1 Score 99.77%
Precision 99.61%
Recall 99.93%
AUC-ROC 0.9999
Average Precision 0.9999

Per-Class Performance

              precision    recall  f1-score   support
       Real      99.92%    99.55%    99.73%     1,320
       Fake      99.61%    99.93%    99.77%     1,536

Training Details

  • Total Epochs: 12
  • Final Training Accuracy: 99.92%
  • Final Validation Accuracy: 99.75%
  • Training Time: ~6 minutes on H100 GPU
  • Model Parameters: 24,559,170

Confusion Matrix

Out of 2,856 test images:

  • Real images (1,320): 1,314 correct, 6 misclassified
  • Fake images (1,536): 1,535 correct, 1 misclassified
  • Total errors: Only 7 images (0.25% error rate)

Intended Use

Primary Use Case

Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024Γ—1024 resolution.

What This Model Can Do

βœ… Detect SDXL 1.0 generated images with 99.75% accuracy
βœ… Identify SDXL-specific generation patterns and artifacts
βœ… Work with 1024Γ—1024 SDXL outputs

What This Model Cannot Do

❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.)
❌ Work reliably on non-1024Γ—1024 resolutions
❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.)

Note: For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors.

Training Data

Real Images (9,034 total)

  • Food101: 2,000 images (food photography)
  • AFHQ: 2,000 images (animal faces)
  • Oxford Pets: 2,000 images (pet photography)
  • Stanford Cars: 2,000 images (vehicle photography)
  • Beans: 1,034 images (agricultural images)

All real images were resized to 1024Γ—1024 to match SDXL output dimensions.

Fake Images (10,000 total)

  • Source: SDXL 1.0 generated images
  • Resolution: 1024Γ—1024
  • Dataset: ash12321/sdxl-generated-10k

Data Split

  • Training: 70% (13,323 images)
  • Validation: 15% (2,855 images)
  • Test: 15% (2,856 images)

Model Architecture

Base Model: ResNet-50 (pretrained on ImageNet)

Custom Classifier Head:

Sequential(
    Dropout(p=0.3),
    Linear(2048 β†’ 512),
    BatchNorm1d(512),
    ReLU(),
    Dropout(p=0.15),
    Linear(512 β†’ 2)
)

Input: RGB images resized to 224Γ—224
Output: Binary classification (Real vs SDXL-generated)

Training Configuration

Hyperparameters

  • Optimizer: AdamW
  • Learning Rate: 0.001 (with cosine annealing)
  • Batch Size: 128
  • Weight Decay: 0.01
  • Dropout: 0.3
  • Label Smoothing: 0.05
  • Mixed Precision: bfloat16 (H100 optimized)

Augmentation (Training Only)

  • RandomResizedCrop (scale: 0.8-1.0)
  • RandomHorizontalFlip (p=0.5)
  • RandomRotation (Β±15Β°)
  • ColorJitter (brightness, contrast, saturation, hue)
  • Normalization (ImageNet stats)

Hardware

  • GPU: NVIDIA H100
  • Training Time: ~6 minutes
  • Inference Speed: ~4ms per image (H100)

Usage

Installation

pip install torch torchvision pillow huggingface_hub

Quick Start

import torch
from torchvision import transforms
from PIL import Image
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="ash12321/sdxl-detector-resnet50",
    filename="best.pth"
)

# Load model
checkpoint = torch.load(model_path, map_location='cpu')

# Create model architecture
import torchvision.models as models
import torch.nn as nn

class SDXLDetector(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = models.resnet50(pretrained=False)
        num_features = self.backbone.fc.in_features
        self.backbone.fc = nn.Sequential(
            nn.Dropout(p=0.3),
            nn.Linear(num_features, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.15),
            nn.Linear(512, 2)
        )
    
    def forward(self, x):
        return self.backbone(x)

# Initialize and load weights
model = SDXLDetector()
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Preprocessing
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# Predict
image = Image.open("test_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
    outputs = model(input_tensor)
    probs = torch.softmax(outputs, dim=1)
    prediction = torch.argmax(probs, dim=1).item()
    confidence = probs[0][prediction].item()

# Results
labels = ['Real', 'SDXL-generated']
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {confidence*100:.2f}%")

Batch Prediction

from torch.utils.data import DataLoader, Dataset

class ImageDataset(Dataset):
    def __init__(self, image_paths, transform):
        self.image_paths = image_paths
        self.transform = transform
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        image = Image.open(self.image_paths[idx]).convert('RGB')
        return self.transform(image)

# Create dataset and loader
image_paths = ['image1.jpg', 'image2.jpg', ...]
dataset = ImageDataset(image_paths, transform)
loader = DataLoader(dataset, batch_size=32, num_workers=4)

# Batch inference
predictions = []
confidences = []

model.eval()
with torch.no_grad():
    for batch in loader:
        outputs = model(batch)
        probs = torch.softmax(outputs, dim=1)
        preds = torch.argmax(probs, dim=1)
        confs = torch.max(probs, dim=1)[0]
        
        predictions.extend(preds.cpu().numpy())
        confidences.extend(confs.cpu().numpy())

Limitations

  1. Generator-Specific: Only trained on SDXL 1.0. Will not reliably detect:

    • Other Stable Diffusion versions (1.5, 2.1, 3.0)
    • Midjourney, DALL-E, Flux
    • Other generative models
  2. Resolution-Specific: Optimized for 1024Γ—1024 SDXL images. Performance may degrade on:

    • Lower resolutions
    • Higher resolutions
    • Non-square aspect ratios
  3. Dataset Bias: Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on:

    • Artistic images
    • Abstract images
    • Specialized domains (medical, satellite, etc.)
  4. Adversarial Attacks: Not hardened against adversarial perturbations

Ethical Considerations

Intended Applications

βœ… Content moderation
βœ… Academic research
βœ… Digital forensics
βœ… Media verification

Prohibited Uses

❌ Surveillance without consent
❌ Discrimination or profiling
❌ Bypassing content policies

False Positives/Negatives

  • False Positives (0.45%): Real images misclassified as SDXL-generated

    • May unfairly flag authentic content
    • Always provide human review for high-stakes decisions
  • False Negatives (0.07%): SDXL images misclassified as real

    • SDXL-generated content may slip through
    • Use as part of multi-layer verification

Transparency

This model should be deployed with clear communication to users about:

  • Its specific purpose (SDXL detection only)
  • Its limitations (not for other generators)
  • Confidence scores for each prediction
  • The possibility of errors

Citation

If you use this model in your research, please cite:

@misc{sdxl_detector_2024,
  author = {Your Name},
  title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}},
}

Model Card Authors

ash12321

Model Card Contact

For questions or issues, please open an issue on the model repository.

License

MIT License

Changelog

Version 1.0 (2025-12-30)

  • Initial release
  • 99.75% test accuracy on SDXL detection
  • ResNet-50 architecture
  • Trained on 19,034 images (9,034 real + 10,000 SDXL)

Keywords: SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision

Downloads last month
50
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train ash12321/sdxl-detector-resnet50