--- license: apache-2.0 datasets: - canada-guesser/Canadian-streetview-cities language: - en base_model: - facebook/convnext-tiny-224 - timm/swinv2_base_window12_192.ms_in22k pipeline_tag: image-classification tags: - cnn - vit - canada metrics: - accuracy --- # Canadian Street View Classifier Deep learning models for classifying street-view images of Canadian cities. This repository contains multiple models fine-tuned on the **Canadian Street View Cities** dataset. --- ## Models Included - **SwinV2** - **ConvNeXt** Contains a CNN-based model and a Transformer-based model both trained to predict the city from a street-view image. --- ## Cities Included Calgary, Charlottetown, Edmonton, Halifax, Hamilton, Kitchener-Waterloo, Montreal, Ottawa-Gatineau, Québec City, Saskatoon, St Johns, Toronto, Vancouver, Victoria, Winnipeg --- ## Model Performance | Model | Accuracy | Macro Precision | Macro Recall | Macro F1-Score | |---------------------|----------|-----------------|--------------|----------------| | ConvNeXt-tiny | 0.98980 | 0.98983 | 0.98980 | 0.98980 | | Swin Transformer V2 | 0.99440 | 0.99439 | 0.99440 | 0.99439 | Performance was evaluated on the test split of the Canadian Street View Cities dataset. Both models achieve high accuracy across all classes, with Swin Transformer V2 slightly outperforming ConvNeXt-tiny. ### Known Limitations These models were trained on images sourced from Mapillary. As a result, their performance may be lower when applied to street-view images from other datasets or sources, due to differences in image style, quality, or perspective. --- ## Demo Try the model live in a Space: [Canadian StreetView Classifier](https://huggingface.co/spaces/canada-guesser/canadian_streetview_cities_demo) --- ## Usage Example ### Installation ```bash pip install torch torchvision timm huggingface_hub ``` ### Download Model Weights ```python from huggingface_hub import hf_hub_download vit_path = hf_hub_download( repo_id="canada-guesser/canadian_streetview_cities_models", filename="vit_model/swinv2_base_window12_192_0_finetuned_canadian_streetview.bin" ) ``` ### Initialize model ```python import torch import timm model = timm.create_model("swinv2_base_window12_192", pretrained=False, num_classes=15) model.load_state_dict(torch.load(vit_path, map_location="cpu")) model.eval() ``` ### Transform and predict ```python from PIL import Image from torchvision import transforms transform = transforms.Compose([ transforms.Resize((192, 192)), transforms.ToTensor(), transforms.Normalize(mean=(0.5,0.5,0.5), std=(0.5,0.5,0.5)) ]) class_names = [ "Calgary", "Charlottetown", "Edmonton", "Halifax", "Hamilton", "Kitchener-Waterloo", "Montreal", "Ottawa-Gatineau", "Quebec City", "Saskatoon", "St Johns", "Toronto", "Vancouver", "Victoria", "Winnipeg", ] img = Image.open("img.jpg").convert("RGB") x = transform(img).unsqueeze(0) with torch.no_grad(): pred = model(x) print(class_names[pred.argmax().item()]) ``` ## Citation If you use this dataset or models, please cite: 1. Stephen Rebel, Danial McIntyre, Sharav Bali. *Canadian Street View Classifier*. Hugging Face, 2025.