SDXL Detector (ResNet-50)
Model Description
A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024Γ1024 resolution.
Architecture: ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection)
Training Date: December 30, 2025
Purpose: This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models.
Performance Metrics
Test Set Results (2,856 images)
| Metric | Score |
|---|---|
| Accuracy | 99.75% |
| F1 Score | 99.77% |
| Precision | 99.61% |
| Recall | 99.93% |
| AUC-ROC | 0.9999 |
| Average Precision | 0.9999 |
Per-Class Performance
precision recall f1-score support
Real 99.92% 99.55% 99.73% 1,320
Fake 99.61% 99.93% 99.77% 1,536
Training Details
- Total Epochs: 12
- Final Training Accuracy: 99.92%
- Final Validation Accuracy: 99.75%
- Training Time: ~6 minutes on H100 GPU
- Model Parameters: 24,559,170
Confusion Matrix
Out of 2,856 test images:
- Real images (1,320): 1,314 correct, 6 misclassified
- Fake images (1,536): 1,535 correct, 1 misclassified
- Total errors: Only 7 images (0.25% error rate)
Intended Use
Primary Use Case
Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024Γ1024 resolution.
What This Model Can Do
β
Detect SDXL 1.0 generated images with 99.75% accuracy
β
Identify SDXL-specific generation patterns and artifacts
β
Work with 1024Γ1024 SDXL outputs
What This Model Cannot Do
β Detect images from other generators (Midjourney, DALL-E, Flux, etc.)
β Work reliably on non-1024Γ1024 resolutions
β Detect other Stable Diffusion versions (1.5, 2.1, etc.)
Note: For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors.
Training Data
Real Images (9,034 total)
- Food101: 2,000 images (food photography)
- AFHQ: 2,000 images (animal faces)
- Oxford Pets: 2,000 images (pet photography)
- Stanford Cars: 2,000 images (vehicle photography)
- Beans: 1,034 images (agricultural images)
All real images were resized to 1024Γ1024 to match SDXL output dimensions.
Fake Images (10,000 total)
- Source: SDXL 1.0 generated images
- Resolution: 1024Γ1024
- Dataset: ash12321/sdxl-generated-10k
Data Split
- Training: 70% (13,323 images)
- Validation: 15% (2,855 images)
- Test: 15% (2,856 images)
Model Architecture
Base Model: ResNet-50 (pretrained on ImageNet)
Custom Classifier Head:
Sequential(
Dropout(p=0.3),
Linear(2048 β 512),
BatchNorm1d(512),
ReLU(),
Dropout(p=0.15),
Linear(512 β 2)
)
Input: RGB images resized to 224Γ224
Output: Binary classification (Real vs SDXL-generated)
Training Configuration
Hyperparameters
- Optimizer: AdamW
- Learning Rate: 0.001 (with cosine annealing)
- Batch Size: 128
- Weight Decay: 0.01
- Dropout: 0.3
- Label Smoothing: 0.05
- Mixed Precision: bfloat16 (H100 optimized)
Augmentation (Training Only)
- RandomResizedCrop (scale: 0.8-1.0)
- RandomHorizontalFlip (p=0.5)
- RandomRotation (Β±15Β°)
- ColorJitter (brightness, contrast, saturation, hue)
- Normalization (ImageNet stats)
Hardware
- GPU: NVIDIA H100
- Training Time: ~6 minutes
- Inference Speed: ~4ms per image (H100)
Usage
Installation
pip install torch torchvision pillow huggingface_hub
Quick Start
import torch
from torchvision import transforms
from PIL import Image
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="ash12321/sdxl-detector-resnet50",
filename="best.pth"
)
# Load model
checkpoint = torch.load(model_path, map_location='cpu')
# Create model architecture
import torchvision.models as models
import torch.nn as nn
class SDXLDetector(nn.Module):
def __init__(self):
super().__init__()
self.backbone = models.resnet50(pretrained=False)
num_features = self.backbone.fc.in_features
self.backbone.fc = nn.Sequential(
nn.Dropout(p=0.3),
nn.Linear(num_features, 512),
nn.BatchNorm1d(512),
nn.ReLU(inplace=True),
nn.Dropout(p=0.15),
nn.Linear(512, 2)
)
def forward(self, x):
return self.backbone(x)
# Initialize and load weights
model = SDXLDetector()
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Preprocessing
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# Predict
image = Image.open("test_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
outputs = model(input_tensor)
probs = torch.softmax(outputs, dim=1)
prediction = torch.argmax(probs, dim=1).item()
confidence = probs[0][prediction].item()
# Results
labels = ['Real', 'SDXL-generated']
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {confidence*100:.2f}%")
Batch Prediction
from torch.utils.data import DataLoader, Dataset
class ImageDataset(Dataset):
def __init__(self, image_paths, transform):
self.image_paths = image_paths
self.transform = transform
def __len__(self):
return len(self.image_paths)
def __getitem__(self, idx):
image = Image.open(self.image_paths[idx]).convert('RGB')
return self.transform(image)
# Create dataset and loader
image_paths = ['image1.jpg', 'image2.jpg', ...]
dataset = ImageDataset(image_paths, transform)
loader = DataLoader(dataset, batch_size=32, num_workers=4)
# Batch inference
predictions = []
confidences = []
model.eval()
with torch.no_grad():
for batch in loader:
outputs = model(batch)
probs = torch.softmax(outputs, dim=1)
preds = torch.argmax(probs, dim=1)
confs = torch.max(probs, dim=1)[0]
predictions.extend(preds.cpu().numpy())
confidences.extend(confs.cpu().numpy())
Limitations
Generator-Specific: Only trained on SDXL 1.0. Will not reliably detect:
- Other Stable Diffusion versions (1.5, 2.1, 3.0)
- Midjourney, DALL-E, Flux
- Other generative models
Resolution-Specific: Optimized for 1024Γ1024 SDXL images. Performance may degrade on:
- Lower resolutions
- Higher resolutions
- Non-square aspect ratios
Dataset Bias: Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on:
- Artistic images
- Abstract images
- Specialized domains (medical, satellite, etc.)
Adversarial Attacks: Not hardened against adversarial perturbations
Ethical Considerations
Intended Applications
β
Content moderation
β
Academic research
β
Digital forensics
β
Media verification
Prohibited Uses
β Surveillance without consent
β Discrimination or profiling
β Bypassing content policies
False Positives/Negatives
False Positives (0.45%): Real images misclassified as SDXL-generated
- May unfairly flag authentic content
- Always provide human review for high-stakes decisions
False Negatives (0.07%): SDXL images misclassified as real
- SDXL-generated content may slip through
- Use as part of multi-layer verification
Transparency
This model should be deployed with clear communication to users about:
- Its specific purpose (SDXL detection only)
- Its limitations (not for other generators)
- Confidence scores for each prediction
- The possibility of errors
Citation
If you use this model in your research, please cite:
@misc{sdxl_detector_2024,
author = {Your Name},
title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}},
}
Model Card Authors
ash12321
Model Card Contact
For questions or issues, please open an issue on the model repository.
License
MIT License
Changelog
Version 1.0 (2025-12-30)
- Initial release
- 99.75% test accuracy on SDXL detection
- ResNet-50 architecture
- Trained on 19,034 images (9,034 real + 10,000 SDXL)
Keywords: SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision
- Downloads last month
- 50