ash12321's picture
Add Model card
c8e03ca verified
---
language: en
license: mit
tags:
- image-classification
- fake-detection
- sdxl
- ai-detection
- deepfake-detection
datasets:
- food101
- huggan/AFHQ
- timm/oxford-iiit-pet
- tanganke/stanford_cars
- beans
- ash12321/sdxl-generated-10k
metrics:
- accuracy
- f1
- precision
- recall
- auc
library_name: pytorch
pipeline_tag: image-classification
---
# SDXL Detector (ResNet-50)
## Model Description
A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.
**Architecture:** ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection)
**Training Date:** December 30, 2025
**Purpose:** This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models.
## Performance Metrics
### Test Set Results (2,856 images)
| Metric | Score |
|--------|-------|
| **Accuracy** | **99.75%** |
| **F1 Score** | **99.77%** |
| **Precision** | **99.61%** |
| **Recall** | **99.93%** |
| **AUC-ROC** | **0.9999** |
| **Average Precision** | **0.9999** |
### Per-Class Performance
```
precision recall f1-score support
Real 99.92% 99.55% 99.73% 1,320
Fake 99.61% 99.93% 99.77% 1,536
```
### Training Details
- **Total Epochs:** 12
- **Final Training Accuracy:** 99.92%
- **Final Validation Accuracy:** 99.75%
- **Training Time:** ~6 minutes on H100 GPU
- **Model Parameters:** 24,559,170
### Confusion Matrix
Out of 2,856 test images:
- **Real images (1,320):** 1,314 correct, 6 misclassified
- **Fake images (1,536):** 1,535 correct, 1 misclassified
- **Total errors:** Only 7 images (0.25% error rate)
## Intended Use
### Primary Use Case
Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.
### What This Model Can Do
✅ Detect SDXL 1.0 generated images with 99.75% accuracy
✅ Identify SDXL-specific generation patterns and artifacts
✅ Work with 1024×1024 SDXL outputs
### What This Model Cannot Do
❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.)
❌ Work reliably on non-1024×1024 resolutions
❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.)
**Note:** For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors.
## Training Data
### Real Images (9,034 total)
- **Food101:** 2,000 images (food photography)
- **AFHQ:** 2,000 images (animal faces)
- **Oxford Pets:** 2,000 images (pet photography)
- **Stanford Cars:** 2,000 images (vehicle photography)
- **Beans:** 1,034 images (agricultural images)
All real images were resized to 1024×1024 to match SDXL output dimensions.
### Fake Images (10,000 total)
- **Source:** SDXL 1.0 generated images
- **Resolution:** 1024×1024
- **Dataset:** ash12321/sdxl-generated-10k
### Data Split
- Training: 70% (13,323 images)
- Validation: 15% (2,855 images)
- Test: 15% (2,856 images)
## Model Architecture
**Base Model:** ResNet-50 (pretrained on ImageNet)
**Custom Classifier Head:**
```python
Sequential(
Dropout(p=0.3),
Linear(2048 → 512),
BatchNorm1d(512),
ReLU(),
Dropout(p=0.15),
Linear(512 → 2)
)
```
**Input:** RGB images resized to 224×224
**Output:** Binary classification (Real vs SDXL-generated)
## Training Configuration
### Hyperparameters
- **Optimizer:** AdamW
- **Learning Rate:** 0.001 (with cosine annealing)
- **Batch Size:** 128
- **Weight Decay:** 0.01
- **Dropout:** 0.3
- **Label Smoothing:** 0.05
- **Mixed Precision:** bfloat16 (H100 optimized)
### Augmentation (Training Only)
- RandomResizedCrop (scale: 0.8-1.0)
- RandomHorizontalFlip (p=0.5)
- RandomRotation (±15°)
- ColorJitter (brightness, contrast, saturation, hue)
- Normalization (ImageNet stats)
### Hardware
- **GPU:** NVIDIA H100
- **Training Time:** ~6 minutes
- **Inference Speed:** ~4ms per image (H100)
## Usage
### Installation
```bash
pip install torch torchvision pillow huggingface_hub
```
### Quick Start
```python
import torch
from torchvision import transforms
from PIL import Image
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="ash12321/sdxl-detector-resnet50",
filename="best.pth"
)
# Load model
checkpoint = torch.load(model_path, map_location='cpu')
# Create model architecture
import torchvision.models as models
import torch.nn as nn
class SDXLDetector(nn.Module):
def __init__(self):
super().__init__()
self.backbone = models.resnet50(pretrained=False)
num_features = self.backbone.fc.in_features
self.backbone.fc = nn.Sequential(
nn.Dropout(p=0.3),
nn.Linear(num_features, 512),
nn.BatchNorm1d(512),
nn.ReLU(inplace=True),
nn.Dropout(p=0.15),
nn.Linear(512, 2)
)
def forward(self, x):
return self.backbone(x)
# Initialize and load weights
model = SDXLDetector()
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Preprocessing
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# Predict
image = Image.open("test_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
outputs = model(input_tensor)
probs = torch.softmax(outputs, dim=1)
prediction = torch.argmax(probs, dim=1).item()
confidence = probs[0][prediction].item()
# Results
labels = ['Real', 'SDXL-generated']
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {confidence*100:.2f}%")
```
### Batch Prediction
```python
from torch.utils.data import DataLoader, Dataset
class ImageDataset(Dataset):
def __init__(self, image_paths, transform):
self.image_paths = image_paths
self.transform = transform
def __len__(self):
return len(self.image_paths)
def __getitem__(self, idx):
image = Image.open(self.image_paths[idx]).convert('RGB')
return self.transform(image)
# Create dataset and loader
image_paths = ['image1.jpg', 'image2.jpg', ...]
dataset = ImageDataset(image_paths, transform)
loader = DataLoader(dataset, batch_size=32, num_workers=4)
# Batch inference
predictions = []
confidences = []
model.eval()
with torch.no_grad():
for batch in loader:
outputs = model(batch)
probs = torch.softmax(outputs, dim=1)
preds = torch.argmax(probs, dim=1)
confs = torch.max(probs, dim=1)[0]
predictions.extend(preds.cpu().numpy())
confidences.extend(confs.cpu().numpy())
```
## Limitations
1. **Generator-Specific:** Only trained on SDXL 1.0. Will not reliably detect:
- Other Stable Diffusion versions (1.5, 2.1, 3.0)
- Midjourney, DALL-E, Flux
- Other generative models
2. **Resolution-Specific:** Optimized for 1024×1024 SDXL images. Performance may degrade on:
- Lower resolutions
- Higher resolutions
- Non-square aspect ratios
3. **Dataset Bias:** Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on:
- Artistic images
- Abstract images
- Specialized domains (medical, satellite, etc.)
4. **Adversarial Attacks:** Not hardened against adversarial perturbations
## Ethical Considerations
### Intended Applications
✅ Content moderation
✅ Academic research
✅ Digital forensics
✅ Media verification
### Prohibited Uses
❌ Surveillance without consent
❌ Discrimination or profiling
❌ Bypassing content policies
### False Positives/Negatives
- **False Positives (0.45%):** Real images misclassified as SDXL-generated
- May unfairly flag authentic content
- Always provide human review for high-stakes decisions
- **False Negatives (0.07%):** SDXL images misclassified as real
- SDXL-generated content may slip through
- Use as part of multi-layer verification
### Transparency
This model should be deployed with clear communication to users about:
- Its specific purpose (SDXL detection only)
- Its limitations (not for other generators)
- Confidence scores for each prediction
- The possibility of errors
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{sdxl_detector_2024,
author = {Your Name},
title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}},
}
```
## Model Card Authors
ash12321
## Model Card Contact
For questions or issues, please open an issue on the model repository.
## License
MIT License
## Changelog
### Version 1.0 (2025-12-30)
- Initial release
- 99.75% test accuracy on SDXL detection
- ResNet-50 architecture
- Trained on 19,034 images (9,034 real + 10,000 SDXL)
---
**Keywords:** SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision