SDXL Detector (ResNet-50)

Model Description

A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.

Architecture: ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection)

Training Date: December 30, 2025

Purpose: This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models.

Performance Metrics

Test Set Results (2,856 images)

Metric	Score
Accuracy	99.75%
F1 Score	99.77%
Precision	99.61%
Recall	99.93%
AUC-ROC	0.9999
Average Precision	0.9999

Per-Class Performance

              precision    recall  f1-score   support
       Real      99.92%    99.55%    99.73%     1,320
       Fake      99.61%    99.93%    99.77%     1,536

Training Details

Total Epochs: 12
Final Training Accuracy: 99.92%
Final Validation Accuracy: 99.75%
Training Time: ~6 minutes on H100 GPU
Model Parameters: 24,559,170

Confusion Matrix

Out of 2,856 test images:

Real images (1,320): 1,314 correct, 6 misclassified
Fake images (1,536): 1,535 correct, 1 misclassified
Total errors: Only 7 images (0.25% error rate)

Intended Use

Primary Use Case

Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.

What This Model Can Do

✅ Detect SDXL 1.0 generated images with 99.75% accuracy
✅ Identify SDXL-specific generation patterns and artifacts
✅ Work with 1024×1024 SDXL outputs

What This Model Cannot Do

❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.)
❌ Work reliably on non-1024×1024 resolutions
❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.)

Note: For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors.

Training Data

Real Images (9,034 total)

Food101: 2,000 images (food photography)
AFHQ: 2,000 images (animal faces)
Oxford Pets: 2,000 images (pet photography)
Stanford Cars: 2,000 images (vehicle photography)
Beans: 1,034 images (agricultural images)

All real images were resized to 1024×1024 to match SDXL output dimensions.

Fake Images (10,000 total)

Source: SDXL 1.0 generated images
Resolution: 1024×1024
Dataset: ash12321/sdxl-generated-10k

Data Split

Training: 70% (13,323 images)
Validation: 15% (2,855 images)
Test: 15% (2,856 images)

Model Architecture

Base Model: ResNet-50 (pretrained on ImageNet)

Custom Classifier Head:

Sequential(
    Dropout(p=0.3),
    Linear(2048 → 512),
    BatchNorm1d(512),
    ReLU(),
    Dropout(p=0.15),
    Linear(512 → 2)
)

Input: RGB images resized to 224×224
Output: Binary classification (Real vs SDXL-generated)

Training Configuration

Hyperparameters

Optimizer: AdamW
Learning Rate: 0.001 (with cosine annealing)
Batch Size: 128
Weight Decay: 0.01
Dropout: 0.3
Label Smoothing: 0.05
Mixed Precision: bfloat16 (H100 optimized)

Augmentation (Training Only)

RandomResizedCrop (scale: 0.8-1.0)
RandomHorizontalFlip (p=0.5)
RandomRotation (±15°)
ColorJitter (brightness, contrast, saturation, hue)
Normalization (ImageNet stats)

Hardware

GPU: NVIDIA H100
Training Time: ~6 minutes
Inference Speed: ~4ms per image (H100)

Usage

Installation

pip install torch torchvision pillow huggingface_hub

Quick Start

import torch
from torchvision import transforms
from PIL import Image
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="ash12321/sdxl-detector-resnet50",
    filename="best.pth"
)

# Load model
checkpoint = torch.load(model_path, map_location='cpu')

# Create model architecture
import torchvision.models as models
import torch.nn as nn

class SDXLDetector(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = models.resnet50(pretrained=False)
        num_features = self.backbone.fc.in_features
        self.backbone.fc = nn.Sequential(
            nn.Dropout(p=0.3),
            nn.Linear(num_features, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.15),
            nn.Linear(512, 2)
        )
    
    def forward(self, x):
        return self.backbone(x)

# Initialize and load weights
model = SDXLDetector()
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Preprocessing
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# Predict
image = Image.open("test_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
    outputs = model(input_tensor)
    probs = torch.softmax(outputs, dim=1)
    prediction = torch.argmax(probs, dim=1).item()
    confidence = probs[0][prediction].item()

# Results
labels = ['Real', 'SDXL-generated']
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {confidence*100:.2f}%")

Batch Prediction

from torch.utils.data import DataLoader, Dataset

class ImageDataset(Dataset):
    def __init__(self, image_paths, transform):
        self.image_paths = image_paths
        self.transform = transform
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        image = Image.open(self.image_paths[idx]).convert('RGB')
        return self.transform(image)

# Create dataset and loader
image_paths = ['image1.jpg', 'image2.jpg', ...]
dataset = ImageDataset(image_paths, transform)
loader = DataLoader(dataset, batch_size=32, num_workers=4)

# Batch inference
predictions = []
confidences = []

model.eval()
with torch.no_grad():
    for batch in loader:
        outputs = model(batch)
        probs = torch.softmax(outputs, dim=1)
        preds = torch.argmax(probs, dim=1)
        confs = torch.max(probs, dim=1)[0]
        
        predictions.extend(preds.cpu().numpy())
        confidences.extend(confs.cpu().numpy())

Limitations

Generator-Specific: Only trained on SDXL 1.0. Will not reliably detect:
- Other Stable Diffusion versions (1.5, 2.1, 3.0)
- Midjourney, DALL-E, Flux
- Other generative models
Resolution-Specific: Optimized for 1024×1024 SDXL images. Performance may degrade on:
- Lower resolutions
- Higher resolutions
- Non-square aspect ratios
Dataset Bias: Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on:
- Artistic images
- Abstract images
- Specialized domains (medical, satellite, etc.)
Adversarial Attacks: Not hardened against adversarial perturbations

Ethical Considerations

Intended Applications

✅ Content moderation
✅ Academic research
✅ Digital forensics
✅ Media verification

Prohibited Uses

❌ Surveillance without consent
❌ Discrimination or profiling
❌ Bypassing content policies

False Positives/Negatives

False Positives (0.45%): Real images misclassified as SDXL-generated
- May unfairly flag authentic content
- Always provide human review for high-stakes decisions
False Negatives (0.07%): SDXL images misclassified as real
- SDXL-generated content may slip through
- Use as part of multi-layer verification

Transparency

This model should be deployed with clear communication to users about:

Its specific purpose (SDXL detection only)
Its limitations (not for other generators)
Confidence scores for each prediction
The possibility of errors

Citation

If you use this model in your research, please cite:

@misc{sdxl_detector_2024,
  author = {Your Name},
  title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}},
}

Model Card Authors

ash12321

Model Card Contact

For questions or issues, please open an issue on the model repository.

License

MIT License

Changelog

Version 1.0 (2025-12-30)

Initial release
99.75% test accuracy on SDXL detection
ResNet-50 architecture
Trained on 19,034 images (9,034 real + 10,000 SDXL)

Keywords: SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision

Downloads last month: 50

ash12321
/

sdxl-detector-resnet50