BrandSpotter: Logo Detection and Brand Identification for Sports Broadcasting

BrandSpotter is a multi-stage computer vision pipeline built to detect and identify brand logos in broadcast video, with applications in sponsor visibility measurement and digital out-of-home (DOOH) advertising analytics.

Problem: Broadcasters and sponsors need to quantify how often, and how clearly, brand logos appear on screen during live sports. This requires detecting logos at broadcast speed, classifying them by brand, and handling real-world challenges like motion blur, partial occlusion, camera angle variation, and lighting washout.

Approach: A three-stage pipeline:

YOLO11m for single-class logo region detection (this repo, yolo/)
ResNet50 for brand classification with open-set rejection (this repo, resnet/)
Frame-level analytics for dwell-time measurement and visibility scoring

Source code: github.com/daa2618/brandspotter

Models

YOLO11m: Logo Detection (`yolo/`)

Fine-tuned YOLO11m for single-class logo detection on LogoDet-3K.

Metric	Value
mAP@0.5	0.894
mAP@0.5:0.95	0.639
Precision	0.829
Recall	0.863

Training configuration:

Base model: yolo11m.pt (COCO-pretrained)
Epochs: 50 (best checkpoint at epoch 47)
Image size: 640x640
Optimizer: AdamW (auto-selected)
Learning rate: 0.001
Batch size: auto
Hardware: Google Colab T4 GPU (~2 hours)
Dataset: LogoDet-3K (158,652 images, 3,000 classes collapsed to single "logo" class)
Augmentation: mosaic, RandAugment, erasing (0.4), horizontal flip (0.5)

Design rationale: A single-class detector maximises recall across all logo types, delegating brand-specific classification to the downstream ResNet stage. This separation allows the detector to generalise to unseen brands without retraining.

ResNet50: Brand Classification (`resnet/`)

Fine-tuned ResNet50 classifying logo crops into 35 known brands from the Sport and Clothing super-classes of LogoDet-3K, with entropy-based open-set rejection for brands outside the training set. The full index-to-brand mapping is in resnet/class_map.json.

Closed-set performance (held-out test set, 552 crops, 35 classes):

Metric	Value
Top-1 accuracy	0.889
Top-5 accuracy	0.966
Macro F1	0.895

Open-set rejection (843 crops from 15 brands never seen in training):

Metric	Entropy (selected)	Energy
AUROC (known vs unknown)	0.897	0.885
FPR @ 95% TPR	0.521	0.612

Known v1 limitation: at the default operating point (95% of known-brand crops accepted), 52% of unknown-brand crops still pass through as a known label. Lowering the target TPR to 90% roughly halves that leak (FPR ≈ 0.29). Tuned thresholds are in resnet/openset_thresholds.yaml; pick the operating point that fits your application.

Training configuration (full resolved config in resnet/config_resolved.yaml):

Base model: resnet50 (ImageNet-pretrained)
Phase 1: 5 epochs, classifier head only, lr=1e-3
Phase 2: 25 epochs, layer4 unfrozen, lr=1e-4 with cosine decay
Batch size: 64, weighted class sampling, AMP
Input: 224x224 crops (5% padding around ground-truth boxes)
Augmentation: rotation (10°), colour jitter (0.2), random resized crop (scale 0.7-1.0); no horizontal flip, since logo text is orientation-sensitive
Hardware: Google Colab T4 GPU

Quick Start

Detection (YOLO11m)

from ultralytics import YOLO

# Load directly from HuggingFace
model = YOLO("hf://vectorized-dev/brandspotter/yolo/best.pt")

# Run inference
results = model("path/to/image.jpg")
results[0].show()

Classification (ResNet50)

from huggingface_hub import hf_hub_download
from brandspotter.classify import BrandClassifier  # pip install from the GitHub repo

weights = hf_hub_download("vectorized-dev/brandspotter", "resnet/best.pt")
classifier = BrandClassifier(weights)

result = classifier.predict("path/to/logo_crop.jpg")
print(result)  # top-k (brand, probability) pairs

The checkpoint is self-describing (it embeds the class map, architecture, and training config), so it can also be loaded with plain torchvision:

import torch
from torchvision.models import resnet50

ckpt = torch.load("best.pt", map_location="cpu", weights_only=False)
model = resnet50(num_classes=ckpt["num_classes"])
model.load_state_dict(ckpt["model_state"])
class_map = ckpt["class_map"]  # {index: brand_name}

Repository Contents

yolo/
  best.pt                  # Trained weights (best checkpoint, ~39 MB)
  args.yaml                # Full training arguments
  results.csv              # Per-epoch training metrics
resnet/
  best.pt                  # Trained weights (best val top-1, ~90 MB)
  class_map.json           # Index -> brand mapping (35 classes)
  openset_thresholds.yaml  # Tuned entropy/energy rejection thresholds
  config_resolved.yaml     # Full resolved training config
  results.csv              # Per-epoch training metrics

Roadmap

ResNet50 brand classifier weights and evaluation
Open-set rejection threshold calibration
End-to-end inference script (detect + classify + dwell-time)
Sample results on sports broadcast footage
Dataset card for curated brand dictionary

Dataset

LogoDet-3K (Wang et al., ACM TOMM 2022). 158,652 images across 3,000 logo classes. The detection model treats all logos as a single class for region proposal; brand identification is handled downstream by the ResNet stage (35 known classes, 15 held out for open-set evaluation).

Citation

@article{wang2022logodet3k,
  title={LogoDet-3K: A Large-scale Image Dataset for Logo Detection},
  author={Wang, Jing and Min, Weiqing and Hou, Sujuan and Ma, Shengnan and Zheng, Yuanjie and Jiang, Shuqiang},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications},
  volume={18},
  number={3},
  year={2022},
  publisher={ACM}
}

License

The BrandSpotter code is MIT-licensed. Additional context for these weights:

The YOLO11m detector is fine-tuned from ultralytics YOLO11, which is AGPL-3.0 licensed; review those terms (or Ultralytics' commercial licensing) before deploying the detection weights in a product or network service.
Both models are trained on LogoDet-3K, which its authors distribute for research purposes without an explicit license. Cite the dataset and review its terms for commercial use.

Downloads last month: 60