---
license: apache-2.0
datasets:
- canada-guesser/Canadian-streetview-cities
language:
- en
base_model:
- facebook/convnext-tiny-224
- timm/swinv2_base_window12_192.ms_in22k
pipeline_tag: image-classification
tags:
- cnn
- vit
- canada
metrics:
- accuracy
---

# Canadian Street View Classifier

Deep learning models for classifying street-view images of Canadian cities. This repository contains multiple models fine-tuned on the **Canadian Street View Cities** dataset.

---

## Models Included
- **SwinV2**  
- **ConvNeXt**  

Contains a CNN-based model and a Transformer-based model both trained to predict the city from a street-view image.

---

## Cities Included
Calgary, Charlottetown, Edmonton, Halifax, Hamilton, Kitchener-Waterloo, Montreal, Ottawa-Gatineau, Québec City, Saskatoon, St Johns, Toronto, Vancouver, Victoria, Winnipeg

---

## Model Performance

| Model               | Accuracy | Macro Precision | Macro Recall | Macro F1-Score |
|---------------------|----------|-----------------|--------------|----------------|
| ConvNeXt-tiny       | 0.98980  | 0.98983         | 0.98980      | 0.98980        |
| Swin Transformer V2 | 0.99440  | 0.99439         | 0.99440      | 0.99439        |

Performance was evaluated on the test split of the Canadian Street View Cities dataset. Both models achieve high accuracy across all classes, with Swin Transformer V2 slightly outperforming ConvNeXt-tiny.

### Known Limitations

These models were trained on images sourced from Mapillary. As a result, their performance may be lower when applied to street-view images from other datasets or sources, due to differences in image style, quality, or perspective.

---

## Demo

Try the model live in a Space: [Canadian StreetView Classifier](https://huggingface.co/spaces/canada-guesser/canadian_streetview_cities_demo)

---

## Usage Example

### Installation

```bash
pip install torch torchvision timm huggingface_hub

```

### Download Model Weights

```python
from huggingface_hub import hf_hub_download

vit_path = hf_hub_download(
    repo_id="canada-guesser/canadian_streetview_cities_models",
    filename="vit_model/swinv2_base_window12_192_0_finetuned_canadian_streetview.bin"
)
```

### Initialize model
```python
import torch
import timm

model = timm.create_model("swinv2_base_window12_192", pretrained=False, num_classes=15)
model.load_state_dict(torch.load(vit_path, map_location="cpu"))
model.eval()
```

### Transform and predict
```python
from PIL import Image
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((192, 192)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5,0.5,0.5), std=(0.5,0.5,0.5))
])

class_names = [
    "Calgary", "Charlottetown", "Edmonton", "Halifax", "Hamilton",
    "Kitchener-Waterloo", "Montreal", "Ottawa-Gatineau", "Quebec City", "Saskatoon",
    "St Johns", "Toronto", "Vancouver", "Victoria", "Winnipeg",
]

img = Image.open("img.jpg").convert("RGB")
x = transform(img).unsqueeze(0)

with torch.no_grad():
    pred = model(x)

print(class_names[pred.argmax().item()])

```

## Citation

If you use this dataset or models, please cite:

1. Stephen Rebel, Danial McIntyre, Sharav Bali. *Canadian Street View Classifier*. Hugging Face, 2025.