ash12321
/

sdxl-detector-resnet50

@@ -1,83 +1,355 @@
 ---
 license: mit
 tags:
 - image-classification
-- ai-detection
 - sdxl
 - deepfake-detection
 library_name: pytorch
 ---
-# SDXL Detector - ResNet50
-Binary classifier for detecting AI-generated images from Stable Diffusion XL.
-## Model Details
-- **Architecture**: ResNet-50 (ImageNet pretrained)
-- **Task**: Binary classification (Real vs Fake)
-- **Training Data**: 10,000 real + 10,000 SDXL images
-- **Input Size**: 256×256 RGB
-- **Classes**: Real (0), Fake (1)
-## Performance
-See `test_results.json` for detailed metrics.
 ## Usage
 ```python
 import torch
-from torchvision import models, transforms
 from PIL import Image
-# Load model
-model = models.resnet50()
-model.fc = torch.nn.Sequential(
-    torch.nn.Dropout(0.5),
-    torch.nn.Linear(2048, 2)
 )
-checkpoint = torch.load('pytorch_model.bin')
 model.load_state_dict(checkpoint['model_state_dict'])
 model.eval()
-# Prepare image
 transform = transforms.Compose([
-    transforms.Resize((256, 256)),
     transforms.ToTensor(),
-    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
 ])
-image = Image.open('test.jpg').convert('RGB')
-image = transform(image).unsqueeze(0)
 # Predict
 with torch.no_grad():
-    output = model(image)
-    probs = torch.softmax(output, dim=1)
-    pred = output.argmax(1).item()
-print(f"Prediction: {'Fake' if pred == 1 else 'Real'}")
-print(f"Confidence: {probs[0][pred].item()*100:.2f}%")
 ```
-## Files
-- `pytorch_model.bin`: Model weights
-- `config.json`: Configuration
-- `training_history.csv`: Training metrics
-- `test_results.json`: Test results
-- `*.png`: Visualizations
-## Training
-- Epochs: 30
-- Batch Size: 32
-- Learning Rate: 0.0001
-- Optimizer: AdamW
-- Early Stopping: Patience 5
-## Dataset
-Generated images: [ash12321/sdxl-generated-10k](https://huggingface.co/datasets/ash12321/sdxl-generated-10k)

 ---
+language: en
 license: mit
 tags:
 - image-classification
+- fake-detection
 - sdxl
+- ai-detection
 - deepfake-detection
+datasets:
+- food101
+- huggan/AFHQ
+- timm/oxford-iiit-pet
+- tanganke/stanford_cars
+- beans
+- ash12321/sdxl-generated-10k
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+- auc
 library_name: pytorch
+pipeline_tag: image-classification
 ---
+# SDXL Detector (ResNet-50)
+## Model Description
+A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.
+**Architecture:** ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection)
+**Training Date:** December 30, 2025
+**Purpose:** This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models.
+## Performance Metrics
+### Test Set Results (2,856 images)
+| Metric | Score |
+|--------|-------|
+| **Accuracy** | **99.75%** |
+| **F1 Score** | **99.77%** |
+| **Precision** | **99.61%** |
+| **Recall** | **99.93%** |
+| **AUC-ROC** | **0.9999** |
+| **Average Precision** | **0.9999** |
+### Per-Class Performance
+```
+              precision    recall  f1-score   support
+       Real      99.92%    99.55%    99.73%     1,320
+       Fake      99.61%    99.93%    99.77%     1,536
+```
+### Training Details
+- **Total Epochs:** 12
+- **Final Training Accuracy:** 99.92%
+- **Final Validation Accuracy:** 99.75%
+- **Training Time:** ~6 minutes on H100 GPU
+- **Model Parameters:** 24,559,170
+### Confusion Matrix
+Out of 2,856 test images:
+- **Real images (1,320):** 1,314 correct, 6 misclassified
+- **Fake images (1,536):** 1,535 correct, 1 misclassified
+- **Total errors:** Only 7 images (0.25% error rate)
+## Intended Use
+### Primary Use Case
+Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.
+### What This Model Can Do
+✅ Detect SDXL 1.0 generated images with 99.75% accuracy
+✅ Identify SDXL-specific generation patterns and artifacts
+✅ Work with 1024×1024 SDXL outputs
+### What This Model Cannot Do
+❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.)
+❌ Work reliably on non-1024×1024 resolutions
+❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.)
+**Note:** For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors.
+## Training Data
+### Real Images (9,034 total)
+- **Food101:** 2,000 images (food photography)
+- **AFHQ:** 2,000 images (animal faces)
+- **Oxford Pets:** 2,000 images (pet photography)
+- **Stanford Cars:** 2,000 images (vehicle photography)
+- **Beans:** 1,034 images (agricultural images)
+All real images were resized to 1024×1024 to match SDXL output dimensions.
+### Fake Images (10,000 total)
+- **Source:** SDXL 1.0 generated images
+- **Resolution:** 1024×1024
+- **Dataset:** ash12321/sdxl-generated-10k
+### Data Split
+- Training: 70% (13,323 images)
+- Validation: 15% (2,855 images)
+- Test: 15% (2,856 images)
+## Model Architecture
+**Base Model:** ResNet-50 (pretrained on ImageNet)
+**Custom Classifier Head:**
+```python
+Sequential(
+    Dropout(p=0.3),
+    Linear(2048 → 512),
+    BatchNorm1d(512),
+    ReLU(),
+    Dropout(p=0.15),
+    Linear(512 → 2)
+)
+```
+**Input:** RGB images resized to 224×224
+**Output:** Binary classification (Real vs SDXL-generated)
+## Training Configuration
+### Hyperparameters
+- **Optimizer:** AdamW
+- **Learning Rate:** 0.001 (with cosine annealing)
+- **Batch Size:** 128
+- **Weight Decay:** 0.01
+- **Dropout:** 0.3
+- **Label Smoothing:** 0.05
+- **Mixed Precision:** bfloat16 (H100 optimized)
+### Augmentation (Training Only)
+- RandomResizedCrop (scale: 0.8-1.0)
+- RandomHorizontalFlip (p=0.5)
+- RandomRotation (±15°)
+- ColorJitter (brightness, contrast, saturation, hue)
+- Normalization (ImageNet stats)
+### Hardware
+- **GPU:** NVIDIA H100
+- **Training Time:** ~6 minutes
+- **Inference Speed:** ~4ms per image (H100)
 ## Usage
+### Installation
+```bash
+pip install torch torchvision pillow huggingface_hub
+```
+### Quick Start
 ```python
 import torch
+from torchvision import transforms
 from PIL import Image
+from huggingface_hub import hf_hub_download
+# Download model
+model_path = hf_hub_download(
+    repo_id="ash12321/sdxl-detector-resnet50",
+    filename="best.pth"
 )
+# Load model
+checkpoint = torch.load(model_path, map_location='cpu')
+# Create model architecture
+import torchvision.models as models
+import torch.nn as nn
+class SDXLDetector(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.backbone = models.resnet50(pretrained=False)
+        num_features = self.backbone.fc.in_features
+        self.backbone.fc = nn.Sequential(
+            nn.Dropout(p=0.3),
+            nn.Linear(num_features, 512),
+            nn.BatchNorm1d(512),
+            nn.ReLU(inplace=True),
+            nn.Dropout(p=0.15),
+            nn.Linear(512, 2)
+        )
+    def forward(self, x):
+        return self.backbone(x)
+# Initialize and load weights
+model = SDXLDetector()
 model.load_state_dict(checkpoint['model_state_dict'])
 model.eval()
+# Preprocessing
 transform = transforms.Compose([
+    transforms.Resize(256),
+    transforms.CenterCrop(224),
     transforms.ToTensor(),
+    transforms.Normalize(
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]
+    )
 ])
 # Predict
+image = Image.open("test_image.jpg").convert('RGB')
+input_tensor = transform(image).unsqueeze(0)
 with torch.no_grad():
+    outputs = model(input_tensor)
+    probs = torch.softmax(outputs, dim=1)
+    prediction = torch.argmax(probs, dim=1).item()
+    confidence = probs[0][prediction].item()
+# Results
+labels = ['Real', 'SDXL-generated']
+print(f"Prediction: {labels[prediction]}")
+print(f"Confidence: {confidence*100:.2f}%")
 ```
+### Batch Prediction
+```python
+from torch.utils.data import DataLoader, Dataset
+class ImageDataset(Dataset):
+    def __init__(self, image_paths, transform):
+        self.image_paths = image_paths
+        self.transform = transform
+    def __len__(self):
+        return len(self.image_paths)
+    def __getitem__(self, idx):
+        image = Image.open(self.image_paths[idx]).convert('RGB')
+        return self.transform(image)
+# Create dataset and loader
+image_paths = ['image1.jpg', 'image2.jpg', ...]
+dataset = ImageDataset(image_paths, transform)
+loader = DataLoader(dataset, batch_size=32, num_workers=4)
+# Batch inference
+predictions = []
+confidences = []
+model.eval()
+with torch.no_grad():
+    for batch in loader:
+        outputs = model(batch)
+        probs = torch.softmax(outputs, dim=1)
+        preds = torch.argmax(probs, dim=1)
+        confs = torch.max(probs, dim=1)[0]
+        predictions.extend(preds.cpu().numpy())
+        confidences.extend(confs.cpu().numpy())
+```
+## Limitations
+1. **Generator-Specific:** Only trained on SDXL 1.0. Will not reliably detect:
+   - Other Stable Diffusion versions (1.5, 2.1, 3.0)
+   - Midjourney, DALL-E, Flux
+   - Other generative models
+2. **Resolution-Specific:** Optimized for 1024×1024 SDXL images. Performance may degrade on:
+   - Lower resolutions
+   - Higher resolutions
+   - Non-square aspect ratios
+3. **Dataset Bias:** Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on:
+   - Artistic images
+   - Abstract images
+   - Specialized domains (medical, satellite, etc.)
+4. **Adversarial Attacks:** Not hardened against adversarial perturbations
+## Ethical Considerations
+### Intended Applications
+✅ Content moderation
+✅ Academic research
+✅ Digital forensics
+✅ Media verification
+### Prohibited Uses
+❌ Surveillance without consent
+❌ Discrimination or profiling
+❌ Bypassing content policies
+### False Positives/Negatives
+- **False Positives (0.45%):** Real images misclassified as SDXL-generated
+  - May unfairly flag authentic content
+  - Always provide human review for high-stakes decisions
+- **False Negatives (0.07%):** SDXL images misclassified as real
+  - SDXL-generated content may slip through
+  - Use as part of multi-layer verification
+### Transparency
+This model should be deployed with clear communication to users about:
+- Its specific purpose (SDXL detection only)
+- Its limitations (not for other generators)
+- Confidence scores for each prediction
+- The possibility of errors
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{sdxl_detector_2024,
+  author = {Your Name},
+  title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection},
+  year = {2024},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}},
+}
+```
+## Model Card Authors
+ash12321
+## Model Card Contact
+For questions or issues, please open an issue on the model repository.
+## License
+MIT License
+## Changelog
+### Version 1.0 (2025-12-30)
+- Initial release
+- 99.75% test accuracy on SDXL detection
+- ResNet-50 architecture
+- Trained on 19,034 images (9,034 real + 10,000 SDXL)
+---
+**Keywords:** SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision