--- language: en license: mit tags: - image-classification - fake-detection - sdxl - ai-detection - deepfake-detection datasets: - food101 - huggan/AFHQ - timm/oxford-iiit-pet - tanganke/stanford_cars - beans - ash12321/sdxl-generated-10k metrics: - accuracy - f1 - precision - recall - auc library_name: pytorch pipeline_tag: image-classification --- # SDXL Detector (ResNet-50) ## Model Description A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution. **Architecture:** ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection) **Training Date:** December 30, 2025 **Purpose:** This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models. ## Performance Metrics ### Test Set Results (2,856 images) | Metric | Score | |--------|-------| | **Accuracy** | **99.75%** | | **F1 Score** | **99.77%** | | **Precision** | **99.61%** | | **Recall** | **99.93%** | | **AUC-ROC** | **0.9999** | | **Average Precision** | **0.9999** | ### Per-Class Performance ``` precision recall f1-score support Real 99.92% 99.55% 99.73% 1,320 Fake 99.61% 99.93% 99.77% 1,536 ``` ### Training Details - **Total Epochs:** 12 - **Final Training Accuracy:** 99.92% - **Final Validation Accuracy:** 99.75% - **Training Time:** ~6 minutes on H100 GPU - **Model Parameters:** 24,559,170 ### Confusion Matrix Out of 2,856 test images: - **Real images (1,320):** 1,314 correct, 6 misclassified - **Fake images (1,536):** 1,535 correct, 1 misclassified - **Total errors:** Only 7 images (0.25% error rate) ## Intended Use ### Primary Use Case Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution. ### What This Model Can Do ✅ Detect SDXL 1.0 generated images with 99.75% accuracy ✅ Identify SDXL-specific generation patterns and artifacts ✅ Work with 1024×1024 SDXL outputs ### What This Model Cannot Do ❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.) ❌ Work reliably on non-1024×1024 resolutions ❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.) **Note:** For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors. ## Training Data ### Real Images (9,034 total) - **Food101:** 2,000 images (food photography) - **AFHQ:** 2,000 images (animal faces) - **Oxford Pets:** 2,000 images (pet photography) - **Stanford Cars:** 2,000 images (vehicle photography) - **Beans:** 1,034 images (agricultural images) All real images were resized to 1024×1024 to match SDXL output dimensions. ### Fake Images (10,000 total) - **Source:** SDXL 1.0 generated images - **Resolution:** 1024×1024 - **Dataset:** ash12321/sdxl-generated-10k ### Data Split - Training: 70% (13,323 images) - Validation: 15% (2,855 images) - Test: 15% (2,856 images) ## Model Architecture **Base Model:** ResNet-50 (pretrained on ImageNet) **Custom Classifier Head:** ```python Sequential( Dropout(p=0.3), Linear(2048 → 512), BatchNorm1d(512), ReLU(), Dropout(p=0.15), Linear(512 → 2) ) ``` **Input:** RGB images resized to 224×224 **Output:** Binary classification (Real vs SDXL-generated) ## Training Configuration ### Hyperparameters - **Optimizer:** AdamW - **Learning Rate:** 0.001 (with cosine annealing) - **Batch Size:** 128 - **Weight Decay:** 0.01 - **Dropout:** 0.3 - **Label Smoothing:** 0.05 - **Mixed Precision:** bfloat16 (H100 optimized) ### Augmentation (Training Only) - RandomResizedCrop (scale: 0.8-1.0) - RandomHorizontalFlip (p=0.5) - RandomRotation (±15°) - ColorJitter (brightness, contrast, saturation, hue) - Normalization (ImageNet stats) ### Hardware - **GPU:** NVIDIA H100 - **Training Time:** ~6 minutes - **Inference Speed:** ~4ms per image (H100) ## Usage ### Installation ```bash pip install torch torchvision pillow huggingface_hub ``` ### Quick Start ```python import torch from torchvision import transforms from PIL import Image from huggingface_hub import hf_hub_download # Download model model_path = hf_hub_download( repo_id="ash12321/sdxl-detector-resnet50", filename="best.pth" ) # Load model checkpoint = torch.load(model_path, map_location='cpu') # Create model architecture import torchvision.models as models import torch.nn as nn class SDXLDetector(nn.Module): def __init__(self): super().__init__() self.backbone = models.resnet50(pretrained=False) num_features = self.backbone.fc.in_features self.backbone.fc = nn.Sequential( nn.Dropout(p=0.3), nn.Linear(num_features, 512), nn.BatchNorm1d(512), nn.ReLU(inplace=True), nn.Dropout(p=0.15), nn.Linear(512, 2) ) def forward(self, x): return self.backbone(x) # Initialize and load weights model = SDXLDetector() model.load_state_dict(checkpoint['model_state_dict']) model.eval() # Preprocessing transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) # Predict image = Image.open("test_image.jpg").convert('RGB') input_tensor = transform(image).unsqueeze(0) with torch.no_grad(): outputs = model(input_tensor) probs = torch.softmax(outputs, dim=1) prediction = torch.argmax(probs, dim=1).item() confidence = probs[0][prediction].item() # Results labels = ['Real', 'SDXL-generated'] print(f"Prediction: {labels[prediction]}") print(f"Confidence: {confidence*100:.2f}%") ``` ### Batch Prediction ```python from torch.utils.data import DataLoader, Dataset class ImageDataset(Dataset): def __init__(self, image_paths, transform): self.image_paths = image_paths self.transform = transform def __len__(self): return len(self.image_paths) def __getitem__(self, idx): image = Image.open(self.image_paths[idx]).convert('RGB') return self.transform(image) # Create dataset and loader image_paths = ['image1.jpg', 'image2.jpg', ...] dataset = ImageDataset(image_paths, transform) loader = DataLoader(dataset, batch_size=32, num_workers=4) # Batch inference predictions = [] confidences = [] model.eval() with torch.no_grad(): for batch in loader: outputs = model(batch) probs = torch.softmax(outputs, dim=1) preds = torch.argmax(probs, dim=1) confs = torch.max(probs, dim=1)[0] predictions.extend(preds.cpu().numpy()) confidences.extend(confs.cpu().numpy()) ``` ## Limitations 1. **Generator-Specific:** Only trained on SDXL 1.0. Will not reliably detect: - Other Stable Diffusion versions (1.5, 2.1, 3.0) - Midjourney, DALL-E, Flux - Other generative models 2. **Resolution-Specific:** Optimized for 1024×1024 SDXL images. Performance may degrade on: - Lower resolutions - Higher resolutions - Non-square aspect ratios 3. **Dataset Bias:** Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on: - Artistic images - Abstract images - Specialized domains (medical, satellite, etc.) 4. **Adversarial Attacks:** Not hardened against adversarial perturbations ## Ethical Considerations ### Intended Applications ✅ Content moderation ✅ Academic research ✅ Digital forensics ✅ Media verification ### Prohibited Uses ❌ Surveillance without consent ❌ Discrimination or profiling ❌ Bypassing content policies ### False Positives/Negatives - **False Positives (0.45%):** Real images misclassified as SDXL-generated - May unfairly flag authentic content - Always provide human review for high-stakes decisions - **False Negatives (0.07%):** SDXL images misclassified as real - SDXL-generated content may slip through - Use as part of multi-layer verification ### Transparency This model should be deployed with clear communication to users about: - Its specific purpose (SDXL detection only) - Its limitations (not for other generators) - Confidence scores for each prediction - The possibility of errors ## Citation If you use this model in your research, please cite: ```bibtex @misc{sdxl_detector_2024, author = {Your Name}, title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection}, year = {2024}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}}, } ``` ## Model Card Authors ash12321 ## Model Card Contact For questions or issues, please open an issue on the model repository. ## License MIT License ## Changelog ### Version 1.0 (2025-12-30) - Initial release - 99.75% test accuracy on SDXL detection - ResNet-50 architecture - Trained on 19,034 images (9,034 real + 10,000 SDXL) --- **Keywords:** SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision