---
language: en
license: mit
tags:
- image-classification
- fake-detection
- sdxl
- ai-detection
- deepfake-detection
datasets:
- food101
- huggan/AFHQ
- timm/oxford-iiit-pet
- tanganke/stanford_cars
- beans
- ash12321/sdxl-generated-10k
metrics:
- accuracy
- f1
- precision
- recall
- auc
library_name: pytorch
pipeline_tag: image-classification
---

# SDXL Detector (ResNet-50)

## Model Description

A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.

**Architecture:** ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection)

**Training Date:** December 30, 2025

**Purpose:** This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models.

## Performance Metrics

### Test Set Results (2,856 images)

| Metric | Score |
|--------|-------|
| **Accuracy** | **99.75%** |
| **F1 Score** | **99.77%** |
| **Precision** | **99.61%** |
| **Recall** | **99.93%** |
| **AUC-ROC** | **0.9999** |
| **Average Precision** | **0.9999** |

### Per-Class Performance

```
              precision    recall  f1-score   support
       Real      99.92%    99.55%    99.73%     1,320
       Fake      99.61%    99.93%    99.77%     1,536
```

### Training Details

- **Total Epochs:** 12
- **Final Training Accuracy:** 99.92%
- **Final Validation Accuracy:** 99.75%
- **Training Time:** ~6 minutes on H100 GPU
- **Model Parameters:** 24,559,170

### Confusion Matrix

Out of 2,856 test images:
- **Real images (1,320):** 1,314 correct, 6 misclassified
- **Fake images (1,536):** 1,535 correct, 1 misclassified
- **Total errors:** Only 7 images (0.25% error rate)

## Intended Use

### Primary Use Case
Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.

### What This Model Can Do
✅ Detect SDXL 1.0 generated images with 99.75% accuracy  
✅ Identify SDXL-specific generation patterns and artifacts  
✅ Work with 1024×1024 SDXL outputs  

### What This Model Cannot Do
❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.)  
❌ Work reliably on non-1024×1024 resolutions  
❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.)  

**Note:** For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors.

## Training Data

### Real Images (9,034 total)
- **Food101:** 2,000 images (food photography)
- **AFHQ:** 2,000 images (animal faces)
- **Oxford Pets:** 2,000 images (pet photography)
- **Stanford Cars:** 2,000 images (vehicle photography)
- **Beans:** 1,034 images (agricultural images)

All real images were resized to 1024×1024 to match SDXL output dimensions.

### Fake Images (10,000 total)
- **Source:** SDXL 1.0 generated images
- **Resolution:** 1024×1024
- **Dataset:** ash12321/sdxl-generated-10k

### Data Split
- Training: 70% (13,323 images)
- Validation: 15% (2,855 images)
- Test: 15% (2,856 images)

## Model Architecture

**Base Model:** ResNet-50 (pretrained on ImageNet)

**Custom Classifier Head:**
```python
Sequential(
    Dropout(p=0.3),
    Linear(2048 → 512),
    BatchNorm1d(512),
    ReLU(),
    Dropout(p=0.15),
    Linear(512 → 2)
)
```

**Input:** RGB images resized to 224×224  
**Output:** Binary classification (Real vs SDXL-generated)

## Training Configuration

### Hyperparameters
- **Optimizer:** AdamW
- **Learning Rate:** 0.001 (with cosine annealing)
- **Batch Size:** 128
- **Weight Decay:** 0.01
- **Dropout:** 0.3
- **Label Smoothing:** 0.05
- **Mixed Precision:** bfloat16 (H100 optimized)

### Augmentation (Training Only)
- RandomResizedCrop (scale: 0.8-1.0)
- RandomHorizontalFlip (p=0.5)
- RandomRotation (±15°)
- ColorJitter (brightness, contrast, saturation, hue)
- Normalization (ImageNet stats)

### Hardware
- **GPU:** NVIDIA H100
- **Training Time:** ~6 minutes
- **Inference Speed:** ~4ms per image (H100)

## Usage

### Installation

```bash
pip install torch torchvision pillow huggingface_hub
```

### Quick Start

```python
import torch
from torchvision import transforms
from PIL import Image
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="ash12321/sdxl-detector-resnet50",
    filename="best.pth"
)

# Load model
checkpoint = torch.load(model_path, map_location='cpu')

# Create model architecture
import torchvision.models as models
import torch.nn as nn

class SDXLDetector(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = models.resnet50(pretrained=False)
        num_features = self.backbone.fc.in_features
        self.backbone.fc = nn.Sequential(
            nn.Dropout(p=0.3),
            nn.Linear(num_features, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.15),
            nn.Linear(512, 2)
        )
    
    def forward(self, x):
        return self.backbone(x)

# Initialize and load weights
model = SDXLDetector()
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Preprocessing
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# Predict
image = Image.open("test_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
    outputs = model(input_tensor)
    probs = torch.softmax(outputs, dim=1)
    prediction = torch.argmax(probs, dim=1).item()
    confidence = probs[0][prediction].item()

# Results
labels = ['Real', 'SDXL-generated']
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {confidence*100:.2f}%")
```

### Batch Prediction

```python
from torch.utils.data import DataLoader, Dataset

class ImageDataset(Dataset):
    def __init__(self, image_paths, transform):
        self.image_paths = image_paths
        self.transform = transform
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        image = Image.open(self.image_paths[idx]).convert('RGB')
        return self.transform(image)

# Create dataset and loader
image_paths = ['image1.jpg', 'image2.jpg', ...]
dataset = ImageDataset(image_paths, transform)
loader = DataLoader(dataset, batch_size=32, num_workers=4)

# Batch inference
predictions = []
confidences = []

model.eval()
with torch.no_grad():
    for batch in loader:
        outputs = model(batch)
        probs = torch.softmax(outputs, dim=1)
        preds = torch.argmax(probs, dim=1)
        confs = torch.max(probs, dim=1)[0]
        
        predictions.extend(preds.cpu().numpy())
        confidences.extend(confs.cpu().numpy())
```

## Limitations

1. **Generator-Specific:** Only trained on SDXL 1.0. Will not reliably detect:
   - Other Stable Diffusion versions (1.5, 2.1, 3.0)
   - Midjourney, DALL-E, Flux
   - Other generative models

2. **Resolution-Specific:** Optimized for 1024×1024 SDXL images. Performance may degrade on:
   - Lower resolutions
   - Higher resolutions
   - Non-square aspect ratios

3. **Dataset Bias:** Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on:
   - Artistic images
   - Abstract images
   - Specialized domains (medical, satellite, etc.)

4. **Adversarial Attacks:** Not hardened against adversarial perturbations

## Ethical Considerations

### Intended Applications
✅ Content moderation  
✅ Academic research  
✅ Digital forensics  
✅ Media verification  

### Prohibited Uses
❌ Surveillance without consent  
❌ Discrimination or profiling  
❌ Bypassing content policies  

### False Positives/Negatives
- **False Positives (0.45%):** Real images misclassified as SDXL-generated
  - May unfairly flag authentic content
  - Always provide human review for high-stakes decisions
  
- **False Negatives (0.07%):** SDXL images misclassified as real
  - SDXL-generated content may slip through
  - Use as part of multi-layer verification

### Transparency
This model should be deployed with clear communication to users about:
- Its specific purpose (SDXL detection only)
- Its limitations (not for other generators)
- Confidence scores for each prediction
- The possibility of errors

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{sdxl_detector_2024,
  author = {Your Name},
  title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}},
}
```

## Model Card Authors

ash12321

## Model Card Contact

For questions or issues, please open an issue on the model repository.

## License

MIT License

## Changelog

### Version 1.0 (2025-12-30)
- Initial release
- 99.75% test accuracy on SDXL detection
- ResNet-50 architecture
- Trained on 19,034 images (9,034 real + 10,000 SDXL)

---

**Keywords:** SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision