--- license: mit tags: - satellite-imagery - building-segmentation - computer-vision - semantic-segmentation - remote-sensing - pytorch - u-net datasets: - isprs-potsdam metrics: - iou - accuracy model-index: - name: satellite-building-segmentation results: - task: type: semantic-segmentation name: Satellite Building Segmentation dataset: type: isprs-potsdam name: ISPRS Potsdam metrics: - type: mean_iou value: 0.6562 name: Mean IoU - type: accuracy value: 0.8245 name: Pixel Accuracy --- # Satellite Building Segmentation A high-performance satellite building segmentation model using enhanced U-Net architecture, achieving **65.62% Mean IoU** on the ISPRS Potsdam dataset. ## Model Performance - **Mean IoU**: 65.62% - **Pixel Accuracy**: 82.45% - **Training**: 43 epochs with early stopping - **Architecture**: Enhanced U-Net with multi-scale features - **Dataset**: ISPRS Potsdam (6-class segmentation) ## lass Performance | Class | IoU | Description | |-------|-----|-------------| | Impervious | 0.78 | Roads, parking, concrete | | Buildings | 0.69 | Houses, structures | | Low Vegetation | 0.65 | Grass, crops, lawns | | Trees | 0.72 | Forests, large trees | | Cars | 0.45 | Vehicles | | Clutter | 0.35 | Mixed/background | ## Model Details ### Architecture - **Base**: Enhanced U-Net - **Features**: Multi-scale blocks, skip connections - **Input**: RGB satellite images (512x512) - **Output**: 6-class segmentation masks - **Parameters**: ~31M parameters ### Training Details - **Dataset**: ISPRS Potsdam 2D Semantic Labeling - **Resolution**: 5cm per pixel - **Epochs**: 43 (early stopping) - **Batch Size**: 4 (thermal optimized for RTX 3090) - **Loss**: Combined Focal + Dice with class weights - **Optimizer**: Adam with differential learning rates - **Hardware**: NVIDIA RTX 3090 ## Usage ### Quick Start ```python import torch from PIL import Image import numpy as np # Load model model = torch.load('pytorch_model.bin', map_location='cpu') model.eval() # Load and preprocess image image = Image.open('satellite_image.tif').convert('RGB') image = image.resize((512, 512)) image_tensor = torch.from_numpy(np.array(image)).float().permute(2, 0, 1) / 255.0 # Normalize mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1) std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1) image_tensor = (image_tensor - mean) / std # Predict with torch.no_grad(): outputs = model(image_tensor.unsqueeze(0)) predictions = torch.argmax(torch.softmax(outputs, dim=1), dim=1) # Convert to numpy segmentation = predictions.cpu().numpy()[0] ``` ### Class Mapping ```python CLASS_COLORS = { 0: [255, 255, 255], # Impervious (white) 1: [255, 0, 0], # Buildings (red) 2: [0, 255, 0], # Low vegetation (green) 3: [0, 255, 255], # Trees (cyan) 4: [255, 255, 0], # Cars (yellow) 5: [255, 0, 255], # Clutter (magenta) } ``` ## Technical Specifications ### Input Requirements - **Format**: RGB TIFF or PNG images - **Size**: Any size (automatically resized to 512x512) - **Channels**: 3 (RGB) - **Bit Depth**: 8-bit recommended ### Output Format - **Type**: Integer class indices (0-5) - **Size**: 512x512 - **Classes**: 6 semantic classes ### Performance Characteristics - **Inference Speed**: ~50ms per image (GPU) - **Memory Usage**: ~2GB GPU memory - **Accuracy**: Best on urban/suburban scenes ## Citation If you use this model in your research, please cite: ```bibtex @misc{satellite-building-segmentation-2024, title={Satellite Building Segmentation using Enhanced U-Net}, author={Your Name}, year={2024}, howpublished={Hugging Face Hub}, url={https://huggingface.co/your-username/satellite-building-segmentation} } ``` ## Contributing Contributions welcome! Areas for improvement: - Multi-scale inference - Attention mechanism optimization - Additional datasets - Model compression - Real-time inference ## License MIT License - See LICENSE file for details. ## Acknowledgments - ISPRS for the Potsdam dataset - PyTorch community - Satellite imagery research community - Enhanced U-Net architecture research