| --- |
| language: en |
| license: mit |
| tags: |
| - image-classification |
| - fake-detection |
| - sdxl |
| - ai-detection |
| - deepfake-detection |
| datasets: |
| - food101 |
| - huggan/AFHQ |
| - timm/oxford-iiit-pet |
| - tanganke/stanford_cars |
| - beans |
| - ash12321/sdxl-generated-10k |
| metrics: |
| - accuracy |
| - f1 |
| - precision |
| - recall |
| - auc |
| library_name: pytorch |
| pipeline_tag: image-classification |
| --- |
| |
| # SDXL Detector (ResNet-50) |
|
|
| ## Model Description |
|
|
| A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution. |
|
|
| **Architecture:** ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection) |
|
|
| **Training Date:** December 30, 2025 |
|
|
| **Purpose:** This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models. |
|
|
| ## Performance Metrics |
|
|
| ### Test Set Results (2,856 images) |
|
|
| | Metric | Score | |
| |--------|-------| |
| | **Accuracy** | **99.75%** | |
| | **F1 Score** | **99.77%** | |
| | **Precision** | **99.61%** | |
| | **Recall** | **99.93%** | |
| | **AUC-ROC** | **0.9999** | |
| | **Average Precision** | **0.9999** | |
|
|
| ### Per-Class Performance |
|
|
| ``` |
| precision recall f1-score support |
| Real 99.92% 99.55% 99.73% 1,320 |
| Fake 99.61% 99.93% 99.77% 1,536 |
| ``` |
|
|
| ### Training Details |
|
|
| - **Total Epochs:** 12 |
| - **Final Training Accuracy:** 99.92% |
| - **Final Validation Accuracy:** 99.75% |
| - **Training Time:** ~6 minutes on H100 GPU |
| - **Model Parameters:** 24,559,170 |
|
|
| ### Confusion Matrix |
|
|
| Out of 2,856 test images: |
| - **Real images (1,320):** 1,314 correct, 6 misclassified |
| - **Fake images (1,536):** 1,535 correct, 1 misclassified |
| - **Total errors:** Only 7 images (0.25% error rate) |
|
|
| ## Intended Use |
|
|
| ### Primary Use Case |
| Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution. |
|
|
| ### What This Model Can Do |
| ✅ Detect SDXL 1.0 generated images with 99.75% accuracy |
| ✅ Identify SDXL-specific generation patterns and artifacts |
| ✅ Work with 1024×1024 SDXL outputs |
|
|
| ### What This Model Cannot Do |
| ❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.) |
| ❌ Work reliably on non-1024×1024 resolutions |
| ❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.) |
|
|
| **Note:** For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors. |
|
|
| ## Training Data |
|
|
| ### Real Images (9,034 total) |
| - **Food101:** 2,000 images (food photography) |
| - **AFHQ:** 2,000 images (animal faces) |
| - **Oxford Pets:** 2,000 images (pet photography) |
| - **Stanford Cars:** 2,000 images (vehicle photography) |
| - **Beans:** 1,034 images (agricultural images) |
|
|
| All real images were resized to 1024×1024 to match SDXL output dimensions. |
|
|
| ### Fake Images (10,000 total) |
| - **Source:** SDXL 1.0 generated images |
| - **Resolution:** 1024×1024 |
| - **Dataset:** ash12321/sdxl-generated-10k |
|
|
| ### Data Split |
| - Training: 70% (13,323 images) |
| - Validation: 15% (2,855 images) |
| - Test: 15% (2,856 images) |
|
|
| ## Model Architecture |
|
|
| **Base Model:** ResNet-50 (pretrained on ImageNet) |
|
|
| **Custom Classifier Head:** |
| ```python |
| Sequential( |
| Dropout(p=0.3), |
| Linear(2048 → 512), |
| BatchNorm1d(512), |
| ReLU(), |
| Dropout(p=0.15), |
| Linear(512 → 2) |
| ) |
| ``` |
|
|
| **Input:** RGB images resized to 224×224 |
| **Output:** Binary classification (Real vs SDXL-generated) |
|
|
| ## Training Configuration |
|
|
| ### Hyperparameters |
| - **Optimizer:** AdamW |
| - **Learning Rate:** 0.001 (with cosine annealing) |
| - **Batch Size:** 128 |
| - **Weight Decay:** 0.01 |
| - **Dropout:** 0.3 |
| - **Label Smoothing:** 0.05 |
| - **Mixed Precision:** bfloat16 (H100 optimized) |
|
|
| ### Augmentation (Training Only) |
| - RandomResizedCrop (scale: 0.8-1.0) |
| - RandomHorizontalFlip (p=0.5) |
| - RandomRotation (±15°) |
| - ColorJitter (brightness, contrast, saturation, hue) |
| - Normalization (ImageNet stats) |
|
|
| ### Hardware |
| - **GPU:** NVIDIA H100 |
| - **Training Time:** ~6 minutes |
| - **Inference Speed:** ~4ms per image (H100) |
|
|
| ## Usage |
|
|
| ### Installation |
|
|
| ```bash |
| pip install torch torchvision pillow huggingface_hub |
| ``` |
|
|
| ### Quick Start |
|
|
| ```python |
| import torch |
| from torchvision import transforms |
| from PIL import Image |
| from huggingface_hub import hf_hub_download |
| |
| # Download model |
| model_path = hf_hub_download( |
| repo_id="ash12321/sdxl-detector-resnet50", |
| filename="best.pth" |
| ) |
| |
| # Load model |
| checkpoint = torch.load(model_path, map_location='cpu') |
| |
| # Create model architecture |
| import torchvision.models as models |
| import torch.nn as nn |
| |
| class SDXLDetector(nn.Module): |
| def __init__(self): |
| super().__init__() |
| self.backbone = models.resnet50(pretrained=False) |
| num_features = self.backbone.fc.in_features |
| self.backbone.fc = nn.Sequential( |
| nn.Dropout(p=0.3), |
| nn.Linear(num_features, 512), |
| nn.BatchNorm1d(512), |
| nn.ReLU(inplace=True), |
| nn.Dropout(p=0.15), |
| nn.Linear(512, 2) |
| ) |
| |
| def forward(self, x): |
| return self.backbone(x) |
| |
| # Initialize and load weights |
| model = SDXLDetector() |
| model.load_state_dict(checkpoint['model_state_dict']) |
| model.eval() |
| |
| # Preprocessing |
| transform = transforms.Compose([ |
| transforms.Resize(256), |
| transforms.CenterCrop(224), |
| transforms.ToTensor(), |
| transforms.Normalize( |
| mean=[0.485, 0.456, 0.406], |
| std=[0.229, 0.224, 0.225] |
| ) |
| ]) |
| |
| # Predict |
| image = Image.open("test_image.jpg").convert('RGB') |
| input_tensor = transform(image).unsqueeze(0) |
| |
| with torch.no_grad(): |
| outputs = model(input_tensor) |
| probs = torch.softmax(outputs, dim=1) |
| prediction = torch.argmax(probs, dim=1).item() |
| confidence = probs[0][prediction].item() |
| |
| # Results |
| labels = ['Real', 'SDXL-generated'] |
| print(f"Prediction: {labels[prediction]}") |
| print(f"Confidence: {confidence*100:.2f}%") |
| ``` |
|
|
| ### Batch Prediction |
|
|
| ```python |
| from torch.utils.data import DataLoader, Dataset |
| |
| class ImageDataset(Dataset): |
| def __init__(self, image_paths, transform): |
| self.image_paths = image_paths |
| self.transform = transform |
| |
| def __len__(self): |
| return len(self.image_paths) |
| |
| def __getitem__(self, idx): |
| image = Image.open(self.image_paths[idx]).convert('RGB') |
| return self.transform(image) |
| |
| # Create dataset and loader |
| image_paths = ['image1.jpg', 'image2.jpg', ...] |
| dataset = ImageDataset(image_paths, transform) |
| loader = DataLoader(dataset, batch_size=32, num_workers=4) |
| |
| # Batch inference |
| predictions = [] |
| confidences = [] |
| |
| model.eval() |
| with torch.no_grad(): |
| for batch in loader: |
| outputs = model(batch) |
| probs = torch.softmax(outputs, dim=1) |
| preds = torch.argmax(probs, dim=1) |
| confs = torch.max(probs, dim=1)[0] |
| |
| predictions.extend(preds.cpu().numpy()) |
| confidences.extend(confs.cpu().numpy()) |
| ``` |
|
|
| ## Limitations |
|
|
| 1. **Generator-Specific:** Only trained on SDXL 1.0. Will not reliably detect: |
| - Other Stable Diffusion versions (1.5, 2.1, 3.0) |
| - Midjourney, DALL-E, Flux |
| - Other generative models |
|
|
| 2. **Resolution-Specific:** Optimized for 1024×1024 SDXL images. Performance may degrade on: |
| - Lower resolutions |
| - Higher resolutions |
| - Non-square aspect ratios |
|
|
| 3. **Dataset Bias:** Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on: |
| - Artistic images |
| - Abstract images |
| - Specialized domains (medical, satellite, etc.) |
|
|
| 4. **Adversarial Attacks:** Not hardened against adversarial perturbations |
|
|
| ## Ethical Considerations |
|
|
| ### Intended Applications |
| ✅ Content moderation |
| ✅ Academic research |
| ✅ Digital forensics |
| ✅ Media verification |
|
|
| ### Prohibited Uses |
| ❌ Surveillance without consent |
| ❌ Discrimination or profiling |
| ❌ Bypassing content policies |
|
|
| ### False Positives/Negatives |
| - **False Positives (0.45%):** Real images misclassified as SDXL-generated |
| - May unfairly flag authentic content |
| - Always provide human review for high-stakes decisions |
| |
| - **False Negatives (0.07%):** SDXL images misclassified as real |
| - SDXL-generated content may slip through |
| - Use as part of multi-layer verification |
|
|
| ### Transparency |
| This model should be deployed with clear communication to users about: |
| - Its specific purpose (SDXL detection only) |
| - Its limitations (not for other generators) |
| - Confidence scores for each prediction |
| - The possibility of errors |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite: |
|
|
| ```bibtex |
| @misc{sdxl_detector_2024, |
| author = {Your Name}, |
| title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection}, |
| year = {2024}, |
| publisher = {HuggingFace}, |
| howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}}, |
| } |
| ``` |
|
|
| ## Model Card Authors |
|
|
| ash12321 |
|
|
| ## Model Card Contact |
|
|
| For questions or issues, please open an issue on the model repository. |
|
|
| ## License |
|
|
| MIT License |
|
|
| ## Changelog |
|
|
| ### Version 1.0 (2025-12-30) |
| - Initial release |
| - 99.75% test accuracy on SDXL detection |
| - ResNet-50 architecture |
| - Trained on 19,034 images (9,034 real + 10,000 SDXL) |
|
|
| --- |
|
|
| **Keywords:** SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision |
|
|