File size: 13,186 Bytes
37ddbc1 e019de9 37ddbc1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 | # `data_label_factory.identify` β open-set image retrieval
The companion to the main labeling pipeline. Where the base
`data_label_factory` produces COCO labels for training a closed-set
**detector**, this subpackage produces a CLIP-based **retrieval index** for
open-set **identification** β given a known set of N reference images,
identify which one a webcam frame is showing.
**Use this when:**
- You have **1 image per class** (a product catalog, a card collection, an
art portfolio, a parts diagram, β¦) and want a "what is this thing I'm
holding up?" tool.
- You want **zero training time** by default and the option to fine-tune for
more accuracy.
- You want to **add new items in seconds** by dropping a JPG in a folder
and re-indexing.
- You want **rarity / variant detection** for free β different prints of
the same item indexed under filenames that encode the variant.
**Use the base pipeline instead when:**
- You need to detect multiple object instances per image with bounding boxes
- Your objects appear in cluttered scenes and need a real detector
- You have many images per class and want a closed-set classifier
---
## The 4-step blueprint (works for ANY image set)
This is the entire workflow. Replace `~/my-collection/` with your reference
folder and you're done.
### Step 0 β install (one-time, ~1 min)
```bash
pip install -e ".[identify]"
# This pulls torch, pillow, clip, fastapi, ultralytics, and uvicorn
```
### Step 1 β gather references (5β30 min depending on source)
You need **one image per class**. The filename becomes the label, so be
deliberate:
```
~/my-collection/
βββ blue_eyes_white_dragon.jpg
βββ dark_magician.jpg
βββ exodia_the_forbidden_one.jpg
βββ ...
```
**Naming rules:**
- The filename stem (minus extension) becomes the displayed label.
- Optional set-code prefixes are auto-stripped: `LOCH-JP001_dark_magician.jpg`
β `Dark Magician`.
- Optional rarity suffixes are extracted as a separate field if they match
one of: `pscr`, `scr`, `ur`, `sr`, `op`, `utr`, `cr`, `ea`, `gmr`. Example:
`dark_magician_pscr.jpg` β name=`Dark Magician`, rarity=`PScR`.
- Underscores become spaces, then title-cased.
**Where to get reference images:**
| Domain | Source |
|---|---|
| Trading cards | ygoprodeck (Yu-Gi-Oh!), PokΓ©mon TCG API, Scryfall (MTG), yugipedia |
| Products | Amazon listing main image, manufacturer site |
| Art / paintings | Wikimedia Commons, museum APIs |
| Industrial parts | Manufacturer catalog scrapes |
| Faces | Selfies (with permission!) |
| Album covers | MusicBrainz cover art archive |
| Movie posters | TMDB API |
**You can mix sources** β e.g. include both English and Japanese versions of
the same card under different filenames. The retrieval system treats them as
separate references but the cosine match will pick whichever is closer to
your live input.
### Step 2 β build the index (10 sec)
```bash
python3 -m data_label_factory.identify index \
--refs ~/my-collection/ \
--out my-index.npz
```
This CLIP-encodes every image and saves the embeddings to a single `.npz`
file (~300 KB for 150 references). On Apple Silicon MPS this is ~50 ms per
image β 150 images takes about 8 seconds.
**Output**: `my-index.npz` containing `embeddings`, `names`, `filenames`.
### Step 3 β verify the index (5 sec)
```bash
python3 -m data_label_factory.identify verify --index my-index.npz
```
Self-tests every reference: each one should match itself as the top-1
result. Reports:
- **Top-1 self-identification rate** (should be 100%)
- **Most-confusable pairs** β references with high mutual similarity
(visually similar items the model might confuse at runtime)
- **Margin analysis** β the gap between "correct match" and "best wrong
match" cosine scores. **This is the strongest predictor of live accuracy.**
**Margin guidelines:**
| Median margin | What it means | Action |
|---|---|---|
| **> 0.3** | Strong separation, live accuracy will be excellent | Ship it |
| **0.1 β 0.3** | Medium separation, expect some confusion on visually similar items | Consider Step 4 |
| **< 0.1** | References look too similar to off-the-shelf CLIP | **Run Step 4** (fine-tune) |
### Step 4 (OPTIONAL) β fine-tune the retrieval head (5β15 min)
If the verify output shows margin < 0.1, your domain (yugioh cards, MTG
cards, similar-looking product variants, β¦) confuses generic CLIP. Fix it
with a contrastive fine-tune:
```bash
python3 -m data_label_factory.identify train \
--refs ~/my-collection/ \
--out my-projection.pt \
--epochs 12
```
**What this does:**
- Loads frozen CLIP ViT-B/32
- Trains a small **projection head** (~400k params) on top of CLIP features
- Uses **K-cards-per-batch sampling** (16 distinct classes Γ 4 augmentations
= 64-image batches)
- Loss: **SupCon** (Khosla et al. 2020) β pulls augmentations of the same
class together, pushes different classes apart
- Augmentations: random crop, rotation Β±20Β°, color jitter, perspective warp,
Gaussian blur, occasional grayscale
- Output: a **1.5 MB `.pt` file** containing the projection head weights
**Reference run** (150-class set, M4 Mac mini, MPS): 12 epochs in ~6 min.
Margin improvement: 0.07 β 0.36 (5Γ wider).
Then re-build the index with the projection head:
```bash
python3 -m data_label_factory.identify index \
--refs ~/my-collection/ \
--out my-index.npz \
--projection my-projection.pt
```
And re-verify to confirm the margin actually widened:
```bash
python3 -m data_label_factory.identify verify --index my-index.npz
```
### Step 5 β serve it as an HTTP endpoint (instant)
```bash
python3 -m data_label_factory.identify serve \
--index my-index.npz \
--refs ~/my-collection/ \
--projection my-projection.pt \
--port 8500
```
This starts a FastAPI server with:
- `POST /api/falcon` β multipart `image` + `query` β JSON response in the
same shape as `mac_tensor`'s `/api/falcon` endpoint, so it's a drop-in
replacement for any client that talks to mac_tensor (including the
data-label-factory `web/canvas/live` UI).
- `GET /refs/<filename>` β serves your reference images as a static mount
so a browser UI can display "this is what the model thinks you're showing".
- `GET /health` β JSON status with index size, projection state, request
counter, etc.
**Point the live tracker UI at it:**
```bash
# In web/.env.local
FALCON_URL=http://localhost:8500/api/falcon
```
Then open `http://localhost:3030/canvas/live` and click **Use Webcam**.
---
## Concrete examples
### Trading cards (the original use case)
```bash
# Step 1: download reference images via the gather command
data_label_factory gather --project projects/yugioh.yaml --max-per-query 1
# β produces ~/data-label-factory/yugioh/positive/cards/*.jpg
# Step 2-5: build, verify, train, serve
python3 -m data_label_factory.identify index --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh.npz
python3 -m data_label_factory.identify verify --index yugioh.npz
python3 -m data_label_factory.identify train --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh_proj.pt
python3 -m data_label_factory.identify index --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh.npz --projection yugioh_proj.pt
python3 -m data_label_factory.identify serve --index yugioh.npz --refs ~/data-label-factory/yugioh/positive/cards/ --projection yugioh_proj.pt
```
### Album covers ("Shazam for vinyl")
```bash
# Get reference images from MusicBrainz cover art archive (one per album)
mkdir ~/my-vinyl
# ... drop in jpgs named after the album ...
python3 -m data_label_factory.identify index --refs ~/my-vinyl --out vinyl.npz
python3 -m data_label_factory.identify serve --index vinyl.npz --refs ~/my-vinyl
# Hold up a record sleeve to your webcam β get the album back
```
### Industrial parts catalog ("which screw is this?")
```bash
mkdir ~/parts
# Drop in one studio shot per part: m3_bolt_10mm.jpg, hex_nut_5mm.jpg, ...
python3 -m data_label_factory.identify index --refs ~/parts --out parts.npz
python3 -m data_label_factory.identify train --refs ~/parts --out parts_proj.pt --epochs 20
python3 -m data_label_factory.identify index --refs ~/parts --out parts.npz --projection parts_proj.pt
python3 -m data_label_factory.identify serve --index parts.npz --refs ~/parts --projection parts_proj.pt
```
### Plant species ID
Same loop with reference images keyed by species name. You don't need PlantNet's
scale to be useful for **your** garden.
---
## Optional: live price feed (`scrape_prices` + UI integration)
If your reference images correspond to items with a market price (trading cards,
collectibles, parts, etc), you can plug in a live price feed and have the live
tracker UI show the price next to each identified item.
### How it works
```
scripts/scrape_prices_<your_site>.py β per-site adapter
β
card_prices.json β keyed by set code, contains JPY/USD/etc
β
data_label_factory.identify serve --prices β¦ β server loads it at startup
β + fetches live FX rate from open.er-api.com
{detection, price: {median, currency, usd_median}} β surfaced per detection
β
web/canvas/live UI β shows USD prominently in the
Active Tracks sidebar + a
Top Valuable Cards panel sorted
by USD descending
```
### Built-in scraper: yuyu-tei.jp (Japanese OCG market)
```bash
python3 -m data_label_factory.identify scrape_prices \
--refs ~/my-cards/ \
--out card_prices.json \
--site yuyu-tei
```
This is the **example adapter**. Add new sites by implementing a
`_scrape_<sitename>(prefixes)` function in `scrape_prices.py` and wiring it
into the dispatcher at the bottom of the file. The output schema is
site-agnostic.
### Live tracker UI features when prices are loaded
- **Per-detection price line** in the Active Tracks sidebar β USD prominently,
original currency underneath
- **Top Valuable Cards panel** β fetched from a new `/api/top-prices` endpoint,
sorted by USD descending, showing the N most valuable items in your set
- **Live FX rate** β JPY/USD conversion fetched once at server startup from
`open.er-api.com` (free, no auth)
- **Filename β name lookup** β server builds a `<set-code> β English display
name` map from your reference filenames so the top-prices panel can show
human-readable names alongside the codes
### Add to Deck (localStorage-backed deck builder)
The live tracker also includes a **`+ Add to Deck`** button on each active
track. Clicking it:
- Adds the identified card to a local deck (browser localStorage, no server state)
- Triggers a green flash + scale animation on the button
- Pulses the deck panel border bright emerald so you can see the card landed
- Updates the running deck total in USD
- Persists across page refreshes
- Lets you remove individual items or clear the whole deck
This is a generic feature that works for any retrieval set β useful for
"build a list of items I've identified" workflows beyond just card collecting
(inventory taking, parts pulling, plant logging, β¦).
---
## The data-label-factory loop, applied to retrieval
```
gather (web search / API / phone photos)
β
label (the filename IS the label β naming convention does the work)
β
verify (data_label_factory.identify verify β self-test)
β
train (optional) (data_label_factory.identify train β fine-tune projection head)
β
deploy (data_label_factory.identify serve β HTTP endpoint)
β
review (data-label-factory web/canvas/live β sees this server as a falcon backend)
```
Same loop, same conventions, just **retrieval instead of detection**.
---
## Files in this folder
```
identify/
βββ __init__.py package marker + lazy import
βββ __main__.py enables `python3 -m data_label_factory.identify <cmd>`
βββ cli.py argparse dispatcher for the four commands
βββ train.py Step 4: contrastive fine-tune
βββ build_index.py Step 2: CLIP encode + save index
βββ verify_index.py Step 3: self-test + margin analysis
βββ serve.py Step 5: FastAPI HTTP endpoint
βββ README.md you are here
```
---
## Why this is **lazy-loaded** (not always-on)
The base `data_label_factory` package only depends on `pyyaml`, `pillow`, and
`requests` β kept lightweight so users running the labeling pipeline don't
pay any ML import cost. The `identify` subpackage adds heavy deps (torch,
clip, ultralytics, fastapi) and is only loaded when explicitly invoked via
`python3 -m data_label_factory.identify <command>`. Same opt-in pattern as
the `runpod` subpackage.
Install the heavy deps with the optional extra:
```bash
pip install -e ".[identify]"
```
|