Instructions to use vectorized-dev/brandspotter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use vectorized-dev/brandspotter with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("vectorized-dev/brandspotter") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
BrandSpotter: Logo Detection and Brand Identification for Sports Broadcasting
BrandSpotter is a multi-stage computer vision pipeline built to detect and identify brand logos in broadcast video, with applications in sponsor visibility measurement and digital out-of-home (DOOH) advertising analytics.
Problem: Broadcasters and sponsors need to quantify how often, and how clearly, brand logos appear on screen during live sports. This requires detecting logos at broadcast speed, classifying them by brand, and handling real-world challenges like motion blur, partial occlusion, camera angle variation, and lighting washout.
Approach: A three-stage pipeline:
- YOLO11m for single-class logo region detection (this repo,
yolo/) - ResNet50 for brand classification with open-set rejection (this repo,
resnet/) - Frame-level analytics for dwell-time measurement and visibility scoring
Source code: github.com/daa2618/brandspotter
Models
YOLO11m: Logo Detection (yolo/)
Fine-tuned YOLO11m for single-class logo detection on LogoDet-3K.
Training configuration:
- Base model:
yolo11m.pt(COCO-pretrained) - Epochs: 50 (best checkpoint at epoch 47)
- Image size: 640x640
- Optimizer: AdamW (auto-selected)
- Learning rate: 0.001
- Batch size: auto
- Hardware: Google Colab T4 GPU (~2 hours)
- Dataset: LogoDet-3K (158,652 images, 3,000 classes collapsed to single "logo" class)
- Augmentation: mosaic, RandAugment, erasing (0.4), horizontal flip (0.5)
Design rationale: A single-class detector maximises recall across all logo types, delegating brand-specific classification to the downstream ResNet stage. This separation allows the detector to generalise to unseen brands without retraining.
ResNet50: Brand Classification (resnet/)
Fine-tuned ResNet50 classifying logo crops into 35 known brands from the
Sport and Clothing super-classes of LogoDet-3K, with entropy-based open-set
rejection for brands outside the training set. The full index-to-brand mapping
is in resnet/class_map.json.
Closed-set performance (held-out test set, 552 crops, 35 classes):
| Metric | Value |
|---|---|
| Top-1 accuracy | 0.889 |
| Top-5 accuracy | 0.966 |
| Macro F1 | 0.895 |
Open-set rejection (843 crops from 15 brands never seen in training):
| Metric | Entropy (selected) | Energy |
|---|---|---|
| AUROC (known vs unknown) | 0.897 | 0.885 |
| FPR @ 95% TPR | 0.521 | 0.612 |
Known v1 limitation: at the default operating point (95% of known-brand crops accepted), 52% of unknown-brand crops still pass through as a known label. Lowering the target TPR to 90% roughly halves that leak (FPR ≈ 0.29). Tuned thresholds are in
resnet/openset_thresholds.yaml; pick the operating point that fits your application.
Training configuration (full resolved config in resnet/config_resolved.yaml):
- Base model:
resnet50(ImageNet-pretrained) - Phase 1: 5 epochs, classifier head only, lr=1e-3
- Phase 2: 25 epochs, layer4 unfrozen, lr=1e-4 with cosine decay
- Batch size: 64, weighted class sampling, AMP
- Input: 224x224 crops (5% padding around ground-truth boxes)
- Augmentation: rotation (10°), colour jitter (0.2), random resized crop (scale 0.7-1.0); no horizontal flip, since logo text is orientation-sensitive
- Hardware: Google Colab T4 GPU
Quick Start
Detection (YOLO11m)
from ultralytics import YOLO
# Load directly from HuggingFace
model = YOLO("hf://vectorized-dev/brandspotter/yolo/best.pt")
# Run inference
results = model("path/to/image.jpg")
results[0].show()
Classification (ResNet50)
from huggingface_hub import hf_hub_download
from brandspotter.classify import BrandClassifier # pip install from the GitHub repo
weights = hf_hub_download("vectorized-dev/brandspotter", "resnet/best.pt")
classifier = BrandClassifier(weights)
result = classifier.predict("path/to/logo_crop.jpg")
print(result) # top-k (brand, probability) pairs
The checkpoint is self-describing (it embeds the class map, architecture, and training config), so it can also be loaded with plain torchvision:
import torch
from torchvision.models import resnet50
ckpt = torch.load("best.pt", map_location="cpu", weights_only=False)
model = resnet50(num_classes=ckpt["num_classes"])
model.load_state_dict(ckpt["model_state"])
class_map = ckpt["class_map"] # {index: brand_name}
Repository Contents
yolo/
best.pt # Trained weights (best checkpoint, ~39 MB)
args.yaml # Full training arguments
results.csv # Per-epoch training metrics
resnet/
best.pt # Trained weights (best val top-1, ~90 MB)
class_map.json # Index -> brand mapping (35 classes)
openset_thresholds.yaml # Tuned entropy/energy rejection thresholds
config_resolved.yaml # Full resolved training config
results.csv # Per-epoch training metrics
Roadmap
- ResNet50 brand classifier weights and evaluation
- Open-set rejection threshold calibration
- End-to-end inference script (detect + classify + dwell-time)
- Sample results on sports broadcast footage
- Dataset card for curated brand dictionary
Dataset
LogoDet-3K (Wang et al., ACM TOMM 2022). 158,652 images across 3,000 logo classes. The detection model treats all logos as a single class for region proposal; brand identification is handled downstream by the ResNet stage (35 known classes, 15 held out for open-set evaluation).
Citation
@article{wang2022logodet3k,
title={LogoDet-3K: A Large-scale Image Dataset for Logo Detection},
author={Wang, Jing and Min, Weiqing and Hou, Sujuan and Ma, Shengnan and Zheng, Yuanjie and Jiang, Shuqiang},
journal={ACM Transactions on Multimedia Computing, Communications, and Applications},
volume={18},
number={3},
year={2022},
publisher={ACM}
}
License
The BrandSpotter code is MIT-licensed. Additional context for these weights:
- The YOLO11m detector is fine-tuned from ultralytics YOLO11, which is AGPL-3.0 licensed; review those terms (or Ultralytics' commercial licensing) before deploying the detection weights in a product or network service.
- Both models are trained on LogoDet-3K, which its authors distribute for research purposes without an explicit license. Cite the dataset and review its terms for commercial use.
- Downloads last month
- 60