waltgrace's picture
feat: v2 provider registry, Expert Sniper integration, model benchmark
e019de9 verified

data_label_factory.identify β€” open-set image retrieval

The companion to the main labeling pipeline. Where the base data_label_factory produces COCO labels for training a closed-set detector, this subpackage produces a CLIP-based retrieval index for open-set identification β€” given a known set of N reference images, identify which one a webcam frame is showing.

Use this when:

  • You have 1 image per class (a product catalog, a card collection, an art portfolio, a parts diagram, …) and want a "what is this thing I'm holding up?" tool.
  • You want zero training time by default and the option to fine-tune for more accuracy.
  • You want to add new items in seconds by dropping a JPG in a folder and re-indexing.
  • You want rarity / variant detection for free β€” different prints of the same item indexed under filenames that encode the variant.

Use the base pipeline instead when:

  • You need to detect multiple object instances per image with bounding boxes
  • Your objects appear in cluttered scenes and need a real detector
  • You have many images per class and want a closed-set classifier

The 4-step blueprint (works for ANY image set)

This is the entire workflow. Replace ~/my-collection/ with your reference folder and you're done.

Step 0 β€” install (one-time, ~1 min)

pip install -e ".[identify]"
# This pulls torch, pillow, clip, fastapi, ultralytics, and uvicorn

Step 1 β€” gather references (5–30 min depending on source)

You need one image per class. The filename becomes the label, so be deliberate:

~/my-collection/
β”œβ”€β”€ blue_eyes_white_dragon.jpg
β”œβ”€β”€ dark_magician.jpg
β”œβ”€β”€ exodia_the_forbidden_one.jpg
└── ...

Naming rules:

  • The filename stem (minus extension) becomes the displayed label.
  • Optional set-code prefixes are auto-stripped: LOCH-JP001_dark_magician.jpg β†’ Dark Magician.
  • Optional rarity suffixes are extracted as a separate field if they match one of: pscr, scr, ur, sr, op, utr, cr, ea, gmr. Example: dark_magician_pscr.jpg β†’ name=Dark Magician, rarity=PScR.
  • Underscores become spaces, then title-cased.

Where to get reference images:

Domain Source
Trading cards ygoprodeck (Yu-Gi-Oh!), PokΓ©mon TCG API, Scryfall (MTG), yugipedia
Products Amazon listing main image, manufacturer site
Art / paintings Wikimedia Commons, museum APIs
Industrial parts Manufacturer catalog scrapes
Faces Selfies (with permission!)
Album covers MusicBrainz cover art archive
Movie posters TMDB API

You can mix sources β€” e.g. include both English and Japanese versions of the same card under different filenames. The retrieval system treats them as separate references but the cosine match will pick whichever is closer to your live input.

Step 2 β€” build the index (10 sec)

python3 -m data_label_factory.identify index \
    --refs ~/my-collection/ \
    --out my-index.npz

This CLIP-encodes every image and saves the embeddings to a single .npz file (~300 KB for 150 references). On Apple Silicon MPS this is ~50 ms per image β€” 150 images takes about 8 seconds.

Output: my-index.npz containing embeddings, names, filenames.

Step 3 β€” verify the index (5 sec)

python3 -m data_label_factory.identify verify --index my-index.npz

Self-tests every reference: each one should match itself as the top-1 result. Reports:

  • Top-1 self-identification rate (should be 100%)
  • Most-confusable pairs β€” references with high mutual similarity (visually similar items the model might confuse at runtime)
  • Margin analysis β€” the gap between "correct match" and "best wrong match" cosine scores. This is the strongest predictor of live accuracy.

Margin guidelines:

Median margin What it means Action
> 0.3 Strong separation, live accuracy will be excellent Ship it
0.1 – 0.3 Medium separation, expect some confusion on visually similar items Consider Step 4
< 0.1 References look too similar to off-the-shelf CLIP Run Step 4 (fine-tune)

Step 4 (OPTIONAL) β€” fine-tune the retrieval head (5–15 min)

If the verify output shows margin < 0.1, your domain (yugioh cards, MTG cards, similar-looking product variants, …) confuses generic CLIP. Fix it with a contrastive fine-tune:

python3 -m data_label_factory.identify train \
    --refs ~/my-collection/ \
    --out my-projection.pt \
    --epochs 12

What this does:

  • Loads frozen CLIP ViT-B/32
  • Trains a small projection head (~400k params) on top of CLIP features
  • Uses K-cards-per-batch sampling (16 distinct classes Γ— 4 augmentations = 64-image batches)
  • Loss: SupCon (Khosla et al. 2020) β€” pulls augmentations of the same class together, pushes different classes apart
  • Augmentations: random crop, rotation Β±20Β°, color jitter, perspective warp, Gaussian blur, occasional grayscale
  • Output: a 1.5 MB .pt file containing the projection head weights

Reference run (150-class set, M4 Mac mini, MPS): 12 epochs in ~6 min. Margin improvement: 0.07 β†’ 0.36 (5Γ— wider).

Then re-build the index with the projection head:

python3 -m data_label_factory.identify index \
    --refs ~/my-collection/ \
    --out my-index.npz \
    --projection my-projection.pt

And re-verify to confirm the margin actually widened:

python3 -m data_label_factory.identify verify --index my-index.npz

Step 5 β€” serve it as an HTTP endpoint (instant)

python3 -m data_label_factory.identify serve \
    --index my-index.npz \
    --refs ~/my-collection/ \
    --projection my-projection.pt \
    --port 8500

This starts a FastAPI server with:

  • POST /api/falcon β€” multipart image + query β†’ JSON response in the same shape as mac_tensor's /api/falcon endpoint, so it's a drop-in replacement for any client that talks to mac_tensor (including the data-label-factory web/canvas/live UI).
  • GET /refs/<filename> β€” serves your reference images as a static mount so a browser UI can display "this is what the model thinks you're showing".
  • GET /health β€” JSON status with index size, projection state, request counter, etc.

Point the live tracker UI at it:

# In web/.env.local
FALCON_URL=http://localhost:8500/api/falcon

Then open http://localhost:3030/canvas/live and click Use Webcam.


Concrete examples

Trading cards (the original use case)

# Step 1: download reference images via the gather command
data_label_factory gather --project projects/yugioh.yaml --max-per-query 1
# β†’ produces ~/data-label-factory/yugioh/positive/cards/*.jpg

# Step 2-5: build, verify, train, serve
python3 -m data_label_factory.identify index --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh.npz
python3 -m data_label_factory.identify verify --index yugioh.npz
python3 -m data_label_factory.identify train --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh_proj.pt
python3 -m data_label_factory.identify index --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh.npz --projection yugioh_proj.pt
python3 -m data_label_factory.identify serve --index yugioh.npz --refs ~/data-label-factory/yugioh/positive/cards/ --projection yugioh_proj.pt

Album covers ("Shazam for vinyl")

# Get reference images from MusicBrainz cover art archive (one per album)
mkdir ~/my-vinyl
# ... drop in jpgs named after the album ...
python3 -m data_label_factory.identify index --refs ~/my-vinyl --out vinyl.npz
python3 -m data_label_factory.identify serve --index vinyl.npz --refs ~/my-vinyl
# Hold up a record sleeve to your webcam β†’ get the album back

Industrial parts catalog ("which screw is this?")

mkdir ~/parts
# Drop in one studio shot per part: m3_bolt_10mm.jpg, hex_nut_5mm.jpg, ...
python3 -m data_label_factory.identify index --refs ~/parts --out parts.npz
python3 -m data_label_factory.identify train --refs ~/parts --out parts_proj.pt --epochs 20
python3 -m data_label_factory.identify index --refs ~/parts --out parts.npz --projection parts_proj.pt
python3 -m data_label_factory.identify serve --index parts.npz --refs ~/parts --projection parts_proj.pt

Plant species ID

Same loop with reference images keyed by species name. You don't need PlantNet's scale to be useful for your garden.


Optional: live price feed (scrape_prices + UI integration)

If your reference images correspond to items with a market price (trading cards, collectibles, parts, etc), you can plug in a live price feed and have the live tracker UI show the price next to each identified item.

How it works

scripts/scrape_prices_<your_site>.py            ← per-site adapter
        ↓
card_prices.json                                ← keyed by set code, contains JPY/USD/etc
        ↓
data_label_factory.identify serve --prices …    ← server loads it at startup
        ↓                                          + fetches live FX rate from open.er-api.com
{detection, price: {median, currency, usd_median}}  ← surfaced per detection
        ↓
web/canvas/live UI                              ← shows USD prominently in the
                                                  Active Tracks sidebar + a
                                                  Top Valuable Cards panel sorted
                                                  by USD descending

Built-in scraper: yuyu-tei.jp (Japanese OCG market)

python3 -m data_label_factory.identify scrape_prices \
    --refs ~/my-cards/ \
    --out card_prices.json \
    --site yuyu-tei

This is the example adapter. Add new sites by implementing a _scrape_<sitename>(prefixes) function in scrape_prices.py and wiring it into the dispatcher at the bottom of the file. The output schema is site-agnostic.

Live tracker UI features when prices are loaded

  • Per-detection price line in the Active Tracks sidebar β€” USD prominently, original currency underneath
  • Top Valuable Cards panel β€” fetched from a new /api/top-prices endpoint, sorted by USD descending, showing the N most valuable items in your set
  • Live FX rate β€” JPY/USD conversion fetched once at server startup from open.er-api.com (free, no auth)
  • Filename β†’ name lookup β€” server builds a <set-code> β†’ English display name map from your reference filenames so the top-prices panel can show human-readable names alongside the codes

Add to Deck (localStorage-backed deck builder)

The live tracker also includes a + Add to Deck button on each active track. Clicking it:

  • Adds the identified card to a local deck (browser localStorage, no server state)
  • Triggers a green flash + scale animation on the button
  • Pulses the deck panel border bright emerald so you can see the card landed
  • Updates the running deck total in USD
  • Persists across page refreshes
  • Lets you remove individual items or clear the whole deck

This is a generic feature that works for any retrieval set β€” useful for "build a list of items I've identified" workflows beyond just card collecting (inventory taking, parts pulling, plant logging, …).


The data-label-factory loop, applied to retrieval

gather              (web search / API / phone photos)
   ↓
label               (the filename IS the label β€” naming convention does the work)
   ↓
verify              (data_label_factory.identify verify β€” self-test)
   ↓
train (optional)    (data_label_factory.identify train β€” fine-tune projection head)
   ↓
deploy              (data_label_factory.identify serve β€” HTTP endpoint)
   ↓
review              (data-label-factory web/canvas/live β€” sees this server as a falcon backend)

Same loop, same conventions, just retrieval instead of detection.


Files in this folder

identify/
β”œβ”€β”€ __init__.py             package marker + lazy import
β”œβ”€β”€ __main__.py             enables `python3 -m data_label_factory.identify <cmd>`
β”œβ”€β”€ cli.py                  argparse dispatcher for the four commands
β”œβ”€β”€ train.py                Step 4: contrastive fine-tune
β”œβ”€β”€ build_index.py          Step 2: CLIP encode + save index
β”œβ”€β”€ verify_index.py         Step 3: self-test + margin analysis
β”œβ”€β”€ serve.py                Step 5: FastAPI HTTP endpoint
└── README.md               you are here

Why this is lazy-loaded (not always-on)

The base data_label_factory package only depends on pyyaml, pillow, and requests β€” kept lightweight so users running the labeling pipeline don't pay any ML import cost. The identify subpackage adds heavy deps (torch, clip, ultralytics, fastapi) and is only loaded when explicitly invoked via python3 -m data_label_factory.identify <command>. Same opt-in pattern as the runpod subpackage.

Install the heavy deps with the optional extra:

pip install -e ".[identify]"