Add Model card

c8e03ca verified 5 months ago

9.24 kB

	---
	language: en
	license: mit
	tags:
	- image-classification
	- fake-detection
	- sdxl
	- ai-detection
	- deepfake-detection
	datasets:
	- food101
	- huggan/AFHQ
	- timm/oxford-iiit-pet
	- tanganke/stanford_cars
	- beans
	- ash12321/sdxl-generated-10k
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	- auc
	library_name: pytorch
	pipeline_tag: image-classification
	---

	# SDXL Detector (ResNet-50)

	## Model Description

	A specialized deep learning model for detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.

	Architecture: ResNet-50 (pretrained on ImageNet, fine-tuned for SDXL detection)

	Training Date: December 30, 2025

	Purpose: This is a specialist model designed specifically for SDXL 1.0 detection. For general AI image detection across multiple generators, use this as part of an ensemble with other specialist models.

	## Performance Metrics

	### Test Set Results (2,856 images)

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 99.75% \|
	\| F1 Score \| 99.77% \|
	\| Precision \| 99.61% \|
	\| Recall \| 99.93% \|
	\| AUC-ROC \| 0.9999 \|
	\| Average Precision \| 0.9999 \|

	### Per-Class Performance

	```
	precision recall f1-score support
	Real 99.92% 99.55% 99.73% 1,320
	Fake 99.61% 99.93% 99.77% 1,536
	```

	### Training Details

	- Total Epochs: 12
	- Final Training Accuracy: 99.92%
	- Final Validation Accuracy: 99.75%
	- Training Time: ~6 minutes on H100 GPU
	- Model Parameters: 24,559,170

	### Confusion Matrix

	Out of 2,856 test images:
	- Real images (1,320): 1,314 correct, 6 misclassified
	- Fake images (1,536): 1,535 correct, 1 misclassified
	- Total errors: Only 7 images (0.25% error rate)

	## Intended Use

	### Primary Use Case
	Detecting images generated by Stable Diffusion XL (SDXL) 1.0 at 1024×1024 resolution.

	### What This Model Can Do
	✅ Detect SDXL 1.0 generated images with 99.75% accuracy
	✅ Identify SDXL-specific generation patterns and artifacts
	✅ Work with 1024×1024 SDXL outputs

	### What This Model Cannot Do
	❌ Detect images from other generators (Midjourney, DALL-E, Flux, etc.)
	❌ Work reliably on non-1024×1024 resolutions
	❌ Detect other Stable Diffusion versions (1.5, 2.1, etc.)

	Note: For comprehensive AI image detection across multiple generators, this model should be used as part of an ensemble with other specialist detectors.

	## Training Data

	### Real Images (9,034 total)
	- Food101: 2,000 images (food photography)
	- AFHQ: 2,000 images (animal faces)
	- Oxford Pets: 2,000 images (pet photography)
	- Stanford Cars: 2,000 images (vehicle photography)
	- Beans: 1,034 images (agricultural images)

	All real images were resized to 1024×1024 to match SDXL output dimensions.

	### Fake Images (10,000 total)
	- Source: SDXL 1.0 generated images
	- Resolution: 1024×1024
	- Dataset: ash12321/sdxl-generated-10k

	### Data Split
	- Training: 70% (13,323 images)
	- Validation: 15% (2,855 images)
	- Test: 15% (2,856 images)

	## Model Architecture

	Base Model: ResNet-50 (pretrained on ImageNet)

	Custom Classifier Head:
	```python
	Sequential(
	Dropout(p=0.3),
	Linear(2048 → 512),
	BatchNorm1d(512),
	ReLU(),
	Dropout(p=0.15),
	Linear(512 → 2)
	)
	```

	Input: RGB images resized to 224×224
	Output: Binary classification (Real vs SDXL-generated)

	## Training Configuration

	### Hyperparameters
	- Optimizer: AdamW
	- Learning Rate: 0.001 (with cosine annealing)
	- Batch Size: 128
	- Weight Decay: 0.01
	- Dropout: 0.3
	- Label Smoothing: 0.05
	- Mixed Precision: bfloat16 (H100 optimized)

	### Augmentation (Training Only)
	- RandomResizedCrop (scale: 0.8-1.0)
	- RandomHorizontalFlip (p=0.5)
	- RandomRotation (±15°)
	- ColorJitter (brightness, contrast, saturation, hue)
	- Normalization (ImageNet stats)

	### Hardware
	- GPU: NVIDIA H100
	- Training Time: ~6 minutes
	- Inference Speed: ~4ms per image (H100)

	## Usage

	### Installation

	```bash
	pip install torch torchvision pillow huggingface_hub
	```

	### Quick Start

	```python
	import torch
	from torchvision import transforms
	from PIL import Image
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(
	repo_id="ash12321/sdxl-detector-resnet50",
	filename="best.pth"
	)

	# Load model
	checkpoint = torch.load(model_path, map_location='cpu')

	# Create model architecture
	import torchvision.models as models
	import torch.nn as nn

	class SDXLDetector(nn.Module):
	def __init__(self):
	super().__init__()
	self.backbone = models.resnet50(pretrained=False)
	num_features = self.backbone.fc.in_features
	self.backbone.fc = nn.Sequential(
	nn.Dropout(p=0.3),
	nn.Linear(num_features, 512),
	nn.BatchNorm1d(512),
	nn.ReLU(inplace=True),
	nn.Dropout(p=0.15),
	nn.Linear(512, 2)
	)

	def forward(self, x):
	return self.backbone(x)

	# Initialize and load weights
	model = SDXLDetector()
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	# Preprocessing
	transform = transforms.Compose([
	transforms.Resize(256),
	transforms.CenterCrop(224),
	transforms.ToTensor(),
	transforms.Normalize(
	mean=[0.485, 0.456, 0.406],
	std=[0.229, 0.224, 0.225]
	)
	])

	# Predict
	image = Image.open("test_image.jpg").convert('RGB')
	input_tensor = transform(image).unsqueeze(0)

	with torch.no_grad():
	outputs = model(input_tensor)
	probs = torch.softmax(outputs, dim=1)
	prediction = torch.argmax(probs, dim=1).item()
	confidence = probs[0][prediction].item()

	# Results
	labels = ['Real', 'SDXL-generated']
	print(f"Prediction: {labels[prediction]}")
	print(f"Confidence: {confidence*100:.2f}%")
	```

	### Batch Prediction

	```python
	from torch.utils.data import DataLoader, Dataset

	class ImageDataset(Dataset):
	def __init__(self, image_paths, transform):
	self.image_paths = image_paths
	self.transform = transform

	def __len__(self):
	return len(self.image_paths)

	def __getitem__(self, idx):
	image = Image.open(self.image_paths[idx]).convert('RGB')
	return self.transform(image)

	# Create dataset and loader
	image_paths = ['image1.jpg', 'image2.jpg', ...]
	dataset = ImageDataset(image_paths, transform)
	loader = DataLoader(dataset, batch_size=32, num_workers=4)

	# Batch inference
	predictions = []
	confidences = []

	model.eval()
	with torch.no_grad():
	for batch in loader:
	outputs = model(batch)
	probs = torch.softmax(outputs, dim=1)
	preds = torch.argmax(probs, dim=1)
	confs = torch.max(probs, dim=1)[0]

	predictions.extend(preds.cpu().numpy())
	confidences.extend(confs.cpu().numpy())
	```

	## Limitations

	1. Generator-Specific: Only trained on SDXL 1.0. Will not reliably detect:
	- Other Stable Diffusion versions (1.5, 2.1, 3.0)
	- Midjourney, DALL-E, Flux
	- Other generative models

	2. Resolution-Specific: Optimized for 1024×1024 SDXL images. Performance may degrade on:
	- Lower resolutions
	- Higher resolutions
	- Non-square aspect ratios

	3. Dataset Bias: Trained on specific real image categories (food, animals, vehicles, etc.). May perform differently on:
	- Artistic images
	- Abstract images
	- Specialized domains (medical, satellite, etc.)

	4. Adversarial Attacks: Not hardened against adversarial perturbations

	## Ethical Considerations

	### Intended Applications
	✅ Content moderation
	✅ Academic research
	✅ Digital forensics
	✅ Media verification

	### Prohibited Uses
	❌ Surveillance without consent
	❌ Discrimination or profiling
	❌ Bypassing content policies

	### False Positives/Negatives
	- False Positives (0.45%): Real images misclassified as SDXL-generated
	- May unfairly flag authentic content
	- Always provide human review for high-stakes decisions

	- False Negatives (0.07%): SDXL images misclassified as real
	- SDXL-generated content may slip through
	- Use as part of multi-layer verification

	### Transparency
	This model should be deployed with clear communication to users about:
	- Its specific purpose (SDXL detection only)
	- Its limitations (not for other generators)
	- Confidence scores for each prediction
	- The possibility of errors

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{sdxl_detector_2024,
	author = {Your Name},
	title = {SDXL Detector: ResNet-50 Fine-tuned for SDXL Detection},
	year = {2024},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-resnet50}},
	}
	```

	## Model Card Authors

	ash12321

	## Model Card Contact

	For questions or issues, please open an issue on the model repository.

	## License

	MIT License

	## Changelog

	### Version 1.0 (2025-12-30)
	- Initial release
	- 99.75% test accuracy on SDXL detection
	- ResNet-50 architecture
	- Trained on 19,034 images (9,034 real + 10,000 SDXL)

	---

	Keywords: SDXL detection, AI image detection, fake image detection, deepfake detection, ResNet-50, image classification, computer vision