feat(identify): open-set image retrieval subpackage

Browse files

Files changed (15) hide show

CLAUDE.md +24 -0
README.md +35 -0
data_label_factory/identify/README.md +288 -0
data_label_factory/identify/__init__.py +47 -0
data_label_factory/identify/__main__.py +7 -0
data_label_factory/identify/build_index.py +133 -0
data_label_factory/identify/cli.py +62 -0
data_label_factory/identify/serve.py +309 -0
data_label_factory/identify/train.py +206 -0
data_label_factory/identify/verify_index.py +102 -0
data_label_factory/runpod/pod_falcon_server.py +411 -0
pyproject.toml +18 -1
web/app/api/falcon-frame/route.ts +19 -3
web/app/canvas/live/page.tsx +58 -14
web/lib/iou-tracker.ts +2 -0

CLAUDE.md CHANGED Viewed

@@ -190,6 +190,30 @@ QWEN_URL=http://192.168.1.244:8291 data_label_factory status
 ---
 ## Optional GPU path
 If a user has more than ~10k images and wants the run to finish in minutes

 ---
+## Optional: open-set identification (`data_label_factory.identify`)
+If a user wants to **identify** which one of N known things they're holding
+up to a webcam (rather than detect arbitrary objects), point them at the
+identify subpackage. It's a CLIP retrieval index — needs only 1 image per
+class, no training required.
+```bash
+pip install -e ".[identify]"
+python3 -m data_label_factory.identify index  --refs ~/my-things/ --out my.npz
+python3 -m data_label_factory.identify verify --index my.npz
+# (optional) python3 -m data_label_factory.identify train --refs ~/my-things/ --out my-proj.pt
+python3 -m data_label_factory.identify serve  --index my.npz --refs ~/my-things/
+# → web/canvas/live talks to it via FALCON_URL=http://localhost:8500/api/falcon
+```
+The full blueprint for any image set is at
+`data_label_factory/identify/README.md`. **This is the right tool for
+"trading cards / products / album covers / parts catalog identification"
+use cases. The base data_label_factory pipeline is for closed-set bbox
+detection.**
+---
 ## Optional GPU path
 If a user has more than ~10k images and wants the run to finish in minutes

README.md CHANGED Viewed

@@ -272,6 +272,41 @@ runpod is just an option.
 ---
 ## Configuration reference
 ### Environment variables

 ---
+## Optional: open-set image identification
+The base pipeline produces COCO labels for training a closed-set **detector**.
+The opt-in `data_label_factory.identify` subpackage produces a CLIP retrieval
+**index** for open-set identification — given a known set of N reference images,
+identify which one a webcam frame is showing. **Use it when you have 1 image
+per class and want zero training time.**
+```bash
+pip install -e ".[identify]"
+# Build an index from a folder of references
+python3 -m data_label_factory.identify index --refs ~/my-cards/ --out my.npz
+# Optional: contrastive fine-tune for fine-grained accuracy (~5 min on M4 MPS)
+python3 -m data_label_factory.identify train --refs ~/my-cards/ --out my-proj.pt
+python3 -m data_label_factory.identify index --refs ~/my-cards/ --out my.npz --projection my-proj.pt
+# Self-test the index
+python3 -m data_label_factory.identify verify --index my.npz
+# Serve as a mac_tensor-shaped /api/falcon endpoint
+python3 -m data_label_factory.identify serve --index my.npz --refs ~/my-cards/
+# → web/canvas/live can hit it with FALCON_URL=http://localhost:8500/api/falcon
+```
+Built-in **rarity / variant detection** for free — if your filenames encode a
+suffix like `_pscr`, `_scr`, `_ur`, the matched filename's suffix becomes a
+separate `rarity` field on the response. See
+[`data_label_factory/identify/README.md`](data_label_factory/identify/README.md)
+for the full blueprint and concrete examples (trading cards, album covers,
+industrial parts, plant species, …).
+---
 ## Configuration reference
 ### Environment variables

data_label_factory/identify/README.md ADDED Viewed

	@@ -0,0 +1,288 @@

+# `data_label_factory.identify` — open-set image retrieval
+The companion to the main labeling pipeline. Where the base
+`data_label_factory` produces COCO labels for training a closed-set
+**detector**, this subpackage produces a CLIP-based **retrieval index** for
+open-set **identification** — given a known set of N reference images,
+identify which one a webcam frame is showing.
+**Use this when:**
+- You have **1 image per class** (a product catalog, a card collection, an
+  art portfolio, a parts diagram, …) and want a "what is this thing I'm
+  holding up?" tool.
+- You want **zero training time** by default and the option to fine-tune for
+  more accuracy.
+- You want to **add new items in seconds** by dropping a JPG in a folder
+  and re-indexing.
+- You want **rarity / variant detection** for free — different prints of
+  the same item indexed under filenames that encode the variant.
+**Use the base pipeline instead when:**
+- You need to detect multiple object instances per image with bounding boxes
+- Your objects appear in cluttered scenes and need a real detector
+- You have many images per class and want a closed-set classifier
+---
+## The 4-step blueprint (works for ANY image set)
+This is the entire workflow. Replace `~/my-collection/` with your reference
+folder and you're done.
+### Step 0 — install (one-time, ~1 min)
+```bash
+pip install -e ".[identify]"
+# This pulls torch, pillow, clip, fastapi, ultralytics, and uvicorn
+```
+### Step 1 — gather references (5–30 min depending on source)
+You need **one image per class**. The filename becomes the label, so be
+deliberate:
+```
+~/my-collection/
+├── blue_eyes_white_dragon.jpg
+├── dark_magician.jpg
+├── exodia_the_forbidden_one.jpg
+└── ...
+```
+**Naming rules:**
+- The filename stem (minus extension) becomes the displayed label.
+- Optional set-code prefixes are auto-stripped: `LOCH-JP001_dark_magician.jpg`
+  → `Dark Magician`.
+- Optional rarity suffixes are extracted as a separate field if they match
+  one of: `pscr`, `scr`, `ur`, `sr`, `op`, `utr`, `cr`, `ea`, `gmr`. Example:
+  `dark_magician_pscr.jpg` → name=`Dark Magician`, rarity=`PScR`.
+- Underscores become spaces, then title-cased.
+**Where to get reference images:**
+| Domain | Source |
+|---|---|
+| Trading cards | ygoprodeck (Yu-Gi-Oh!), Pokémon TCG API, Scryfall (MTG), yugipedia |
+| Products | Amazon listing main image, manufacturer site |
+| Art / paintings | Wikimedia Commons, museum APIs |
+| Industrial parts | Manufacturer catalog scrapes |
+| Faces | Selfies (with permission!) |
+| Album covers | MusicBrainz cover art archive |
+| Movie posters | TMDB API |
+**You can mix sources** — e.g. include both English and Japanese versions of
+the same card under different filenames. The retrieval system treats them as
+separate references but the cosine match will pick whichever is closer to
+your live input.
+### Step 2 — build the index (10 sec)
+```bash
+python3 -m data_label_factory.identify index \
+    --refs ~/my-collection/ \
+    --out my-index.npz
+```
+This CLIP-encodes every image and saves the embeddings to a single `.npz`
+file (~300 KB for 150 references). On Apple Silicon MPS this is ~50 ms per
+image — 150 images takes about 8 seconds.
+**Output**: `my-index.npz` containing `embeddings`, `names`, `filenames`.
+### Step 3 — verify the index (5 sec)
+```bash
+python3 -m data_label_factory.identify verify --index my-index.npz
+```
+Self-tests every reference: each one should match itself as the top-1
+result. Reports:
+- **Top-1 self-identification rate** (should be 100%)
+- **Most-confusable pairs** — references with high mutual similarity
+  (visually similar items the model might confuse at runtime)
+- **Margin analysis** — the gap between "correct match" and "best wrong
+  match" cosine scores. **This is the strongest predictor of live accuracy.**
+**Margin guidelines:**
+| Median margin | What it means | Action |
+|---|---|---|
+| **> 0.3** | Strong separation, live accuracy will be excellent | Ship it |
+| **0.1 – 0.3** | Medium separation, expect some confusion on visually similar items | Consider Step 4 |
+| **< 0.1** | References look too similar to off-the-shelf CLIP | **Run Step 4** (fine-tune) |
+### Step 4 (OPTIONAL) — fine-tune the retrieval head (5–15 min)
+If the verify output shows margin < 0.1, your domain (yugioh cards, MTG
+cards, similar-looking product variants, …) confuses generic CLIP. Fix it
+with a contrastive fine-tune:
+```bash
+python3 -m data_label_factory.identify train \
+    --refs ~/my-collection/ \
+    --out my-projection.pt \
+    --epochs 12
+```
+**What this does:**
+- Loads frozen CLIP ViT-B/32
+- Trains a small **projection head** (~400k params) on top of CLIP features
+- Uses **K-cards-per-batch sampling** (16 distinct classes × 4 augmentations
+  = 64-image batches)
+- Loss: **SupCon** (Khosla et al. 2020) — pulls augmentations of the same
+  class together, pushes different classes apart
+- Augmentations: random crop, rotation ±20°, color jitter, perspective warp,
+  Gaussian blur, occasional grayscale
+- Output: a **1.5 MB `.pt` file** containing the projection head weights
+**Reference run** (150-class set, M4 Mac mini, MPS): 12 epochs in ~6 min.
+Margin improvement: 0.07 → 0.36 (5× wider).
+Then re-build the index with the projection head:
+```bash
+python3 -m data_label_factory.identify index \
+    --refs ~/my-collection/ \
+    --out my-index.npz \
+    --projection my-projection.pt
+```
+And re-verify to confirm the margin actually widened:
+```bash
+python3 -m data_label_factory.identify verify --index my-index.npz
+```
+### Step 5 — serve it as an HTTP endpoint (instant)
+```bash
+python3 -m data_label_factory.identify serve \
+    --index my-index.npz \
+    --refs ~/my-collection/ \
+    --projection my-projection.pt \
+    --port 8500
+```
+This starts a FastAPI server with:
+- `POST /api/falcon` — multipart `image` + `query` → JSON response in the
+  same shape as `mac_tensor`'s `/api/falcon` endpoint, so it's a drop-in
+  replacement for any client that talks to mac_tensor (including the
+  data-label-factory `web/canvas/live` UI).
+- `GET /refs/<filename>` — serves your reference images as a static mount
+  so a browser UI can display "this is what the model thinks you're showing".
+- `GET /health` — JSON status with index size, projection state, request
+  counter, etc.
+**Point the live tracker UI at it:**
+```bash
+# In web/.env.local
+FALCON_URL=http://localhost:8500/api/falcon
+```
+Then open `http://localhost:3030/canvas/live` and click **Use Webcam**.
+---
+## Concrete examples
+### Trading cards (the original use case)
+```bash
+# Step 1: download reference images via the gather command
+data_label_factory gather --project projects/yugioh.yaml --max-per-query 1
+# → produces ~/data-label-factory/yugioh/positive/cards/*.jpg
+# Step 2-5: build, verify, train, serve
+python3 -m data_label_factory.identify index --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh.npz
+python3 -m data_label_factory.identify verify --index yugioh.npz
+python3 -m data_label_factory.identify train --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh_proj.pt
+python3 -m data_label_factory.identify index --refs ~/data-label-factory/yugioh/positive/cards/ --out yugioh.npz --projection yugioh_proj.pt
+python3 -m data_label_factory.identify serve --index yugioh.npz --refs ~/data-label-factory/yugioh/positive/cards/ --projection yugioh_proj.pt
+```
+### Album covers ("Shazam for vinyl")
+```bash
+# Get reference images from MusicBrainz cover art archive (one per album)
+mkdir ~/my-vinyl
+# ... drop in jpgs named after the album ...
+python3 -m data_label_factory.identify index --refs ~/my-vinyl --out vinyl.npz
+python3 -m data_label_factory.identify serve --index vinyl.npz --refs ~/my-vinyl
+# Hold up a record sleeve to your webcam → get the album back
+```
+### Industrial parts catalog ("which screw is this?")
+```bash
+mkdir ~/parts
+# Drop in one studio shot per part: m3_bolt_10mm.jpg, hex_nut_5mm.jpg, ...
+python3 -m data_label_factory.identify index --refs ~/parts --out parts.npz
+python3 -m data_label_factory.identify train --refs ~/parts --out parts_proj.pt --epochs 20
+python3 -m data_label_factory.identify index --refs ~/parts --out parts.npz --projection parts_proj.pt
+python3 -m data_label_factory.identify serve --index parts.npz --refs ~/parts --projection parts_proj.pt
+```
+### Plant species ID
+Same loop with reference images keyed by species name. You don't need PlantNet's
+scale to be useful for **your** garden.
+---
+## The data-label-factory loop, applied to retrieval
+```
+gather              (web search / API / phone photos)
+   ↓
+label               (the filename IS the label — naming convention does the work)
+   ↓
+verify              (data_label_factory.identify verify — self-test)
+   ↓
+train (optional)    (data_label_factory.identify train — fine-tune projection head)
+   ↓
+deploy              (data_label_factory.identify serve — HTTP endpoint)
+   ↓
+review              (data-label-factory web/canvas/live — sees this server as a falcon backend)
+```
+Same loop, same conventions, just **retrieval instead of detection**.
+---
+## Files in this folder
+```
+identify/
+├── __init__.py             package marker + lazy import
+├── __main__.py             enables `python3 -m data_label_factory.identify <cmd>`
+├── cli.py                  argparse dispatcher for the four commands
+├── train.py                Step 4: contrastive fine-tune
+├── build_index.py          Step 2: CLIP encode + save index
+├── verify_index.py         Step 3: self-test + margin analysis
+├── serve.py                Step 5: FastAPI HTTP endpoint
+└── README.md               you are here
+```
+---
+## Why this is **lazy-loaded** (not always-on)
+The base `data_label_factory` package only depends on `pyyaml`, `pillow`, and
+`requests` — kept lightweight so users running the labeling pipeline don't
+pay any ML import cost. The `identify` subpackage adds heavy deps (torch,
+clip, ultralytics, fastapi) and is only loaded when explicitly invoked via
+`python3 -m data_label_factory.identify <command>`. Same opt-in pattern as
+the `runpod` subpackage.
+Install the heavy deps with the optional extra:
+```bash
+pip install -e ".[identify]"
+```

data_label_factory/identify/__init__.py ADDED Viewed

	@@ -0,0 +1,47 @@

+"""data_label_factory.identify — open-set retrieval / card identification.
+The companion to the bbox-grounding pipeline. Where the main `data_label_factory`
+CLI produces COCO labels for training a closed-set detector, this subpackage
+produces a CLIP-based retrieval index for open-set identification.
+Use it when you have a known set of N reference images (cards, products, parts,
+artworks, etc) and want to identify which one a webcam frame is showing — with
+a single reference image per class and zero training time.
+Pipeline stages
+---------------
+    references/                  ← user provides 1 image per class
+         ↓
+    train_identifier             ← optional: contrastive fine-tune of a small
+         ↓                          projection head on top of frozen CLIP
+    clip_proj.pt
+         ↓
+    build_index                  ← CLIP-encode each reference + apply projection
+         ↓                          head, save embeddings to .npz
+    card_index.npz
+         ↓
+    verify_index                 ← self-test: each reference should match itself
+         ↓                          as top-1 with high cosine similarity
+    serve_identifier             ← HTTP server (mac_tensor /api/falcon-shaped)
+         ↓                          that the live tracker UI talks to
+    /api/falcon
+This is the data-label-factory loop applied to retrieval instead of detection.
+CLI
+---
+    python3 -m data_label_factory.identify train  --refs limit-over-pack/ --out clip_proj.pt
+    python3 -m data_label_factory.identify index  --refs limit-over-pack/ --proj clip_proj.pt --out card_index.npz
+    python3 -m data_label_factory.identify verify --index card_index.npz --refs limit-over-pack/
+    python3 -m data_label_factory.identify serve  --index card_index.npz --refs limit-over-pack/ --port 8500
+"""
+__all__ = ["main"]
+def main():
+    """Lazy entry point — only imports the heavy ML deps if user invokes the CLI."""
+    from .cli import main as _main
+    return _main()

data_label_factory/identify/__main__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""Enables `python3 -m data_label_factory.identify <command>`."""
+from .cli import main
+import sys
+if __name__ == "__main__":
+    sys.exit(main())

data_label_factory/identify/build_index.py ADDED Viewed

	@@ -0,0 +1,133 @@

+"""Build a CLIP retrieval index from a folder of reference images.
+Each image's filename becomes its display label (with set-code prefixes
+stripped and rarity suffixes preserved). Optionally applies a fine-tuned
+projection head produced by `data_label_factory.identify train`.
+The output `.npz` contains three arrays:
+    embeddings  (N, D)  L2-normalized
+    names       (N,)    cleaned display names
+    filenames   (N,)    original filenames (so the server can serve refs)
+"""
+from __future__ import annotations
+import argparse
+import os
+import re
+import sys
+from pathlib import Path
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(
+        prog="data_label_factory.identify index",
+        description=(
+            "Encode every image in a reference folder with CLIP (optionally "
+            "passed through a fine-tuned projection head) and save the embeddings "
+            "as a searchable .npz index."
+        ),
+    )
+    parser.add_argument("--refs", required=True, help="Folder of reference images")
+    parser.add_argument("--out", default="card_index.npz", help="Output .npz path")
+    parser.add_argument("--projection", default=None,
+                        help="Optional fine-tuned projection head .pt (from `train`)")
+    parser.add_argument("--clip-model", default="ViT-B/32")
+    args = parser.parse_args(argv)
+    try:
+        import numpy as np
+        import torch
+        import torch.nn as nn
+        import torch.nn.functional as F
+        from PIL import Image
+        import clip
+    except ImportError as e:
+        raise SystemExit(
+            f"missing dependency: {e}\n"
+            "install with:\n"
+            "    pip install torch pillow git+https://github.com/openai/CLIP.git"
+        )
+    DEVICE = ("mps" if torch.backends.mps.is_available()
+              else "cuda" if torch.cuda.is_available() else "cpu")
+    print(f"[index] device={DEVICE}", flush=True)
+    refs = Path(args.refs)
+    if not refs.is_dir():
+        raise SystemExit(f"refs folder not found: {refs}")
+    print(f"[index] loading CLIP {args.clip_model} …", flush=True)
+    model, preprocess = clip.load(args.clip_model, device=DEVICE)
+    model.eval()
+    head = None
+    if args.projection and os.path.exists(args.projection):
+        print(f"[index] loading projection head from {args.projection}", flush=True)
+        class ProjectionHead(nn.Module):
+            def __init__(self, in_dim=512, hidden=512, out_dim=256):
+                super().__init__()
+                self.net = nn.Sequential(
+                    nn.Linear(in_dim, hidden), nn.GELU(), nn.Linear(hidden, out_dim))
+            def forward(self, x):
+                return F.normalize(self.net(x), dim=-1)
+        ckpt = torch.load(args.projection, map_location=DEVICE)
+        sd = ckpt.get("state_dict", ckpt)
+        head = ProjectionHead(
+            in_dim=ckpt.get("in_dim", 512),
+            hidden=ckpt.get("hidden", 512),
+            out_dim=ckpt.get("out_dim", 256),
+        ).to(DEVICE)
+        head.load_state_dict(sd)
+        head.eval()
+        print(f"[index]   out_dim={ckpt.get('out_dim', 256)}", flush=True)
+    files = sorted(f for f in os.listdir(refs)
+                   if f.lower().endswith((".jpg", ".jpeg", ".png", ".webp")))
+    if not files:
+        raise SystemExit(f"no images in {refs}")
+    print(f"[index] {len(files)} reference images", flush=True)
+    embeddings, names, filenames = [], [], []
+    for i, fname in enumerate(files, 1):
+        path = refs / fname
+        # Strip set-code prefix (e.g. "LOCH-JP001_") and clean up underscores
+        stem = os.path.splitext(fname)[0]
+        stem = re.sub(r"^[A-Z]+-[A-Z]+\d+_", "", stem)
+        name = stem.replace("_", " ").title()
+        # "Pharaoh S Servant" → "Pharaoh's Servant"
+        name = re.sub(r"\b(\w+) S\b", r"\1's", name)
+        try:
+            img = Image.open(path).convert("RGB")
+        except Exception as e:
+            print(f"[index]   skip {fname}: {e}", flush=True)
+            continue
+        with torch.no_grad():
+            tensor = preprocess(img).unsqueeze(0).to(DEVICE)
+            feat = model.encode_image(tensor).float()
+            feat = feat / feat.norm(dim=-1, keepdim=True)
+            if head is not None:
+                feat = head(feat)
+        embeddings.append(feat.cpu().numpy()[0].astype(np.float32))
+        names.append(name)
+        filenames.append(fname)
+        if i % 25 == 0 or i == len(files):
+            print(f"[index]   [{i:3d}/{len(files)}] {name[:50]}", flush=True)
+    emb = np.stack(embeddings, axis=0)
+    out = Path(args.out)
+    out.parent.mkdir(parents=True, exist_ok=True)
+    np.savez(out,
+             embeddings=emb,
+             names=np.array(names, dtype=object),
+             filenames=np.array(filenames, dtype=object))
+    print(f"\n[index] ✓ wrote {out}  ({emb.shape[0]} refs × {emb.shape[1]} dims, "
+          f"{out.stat().st_size / 1024:.1f} KB)", flush=True)
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

data_label_factory/identify/cli.py ADDED Viewed

	@@ -0,0 +1,62 @@

+"""CLI dispatcher for `python3 -m data_label_factory.identify <command>`.
+Subcommands:
+    index    → build_index.main
+    verify   → verify_index.main
+    train    → train.main
+    serve    → serve.main
+Each is lazy-loaded so users only pay the import cost for the command they
+actually invoke.
+"""
+from __future__ import annotations
+import sys
+HELP = """\
+data_label_factory.identify — open-set image retrieval
+usage: python3 -m data_label_factory.identify <command> [options]
+commands:
+  index    Build a CLIP retrieval index from a folder of reference images
+  verify   Self-test an index and report margin / confusable pairs
+  train    Contrastive fine-tune a projection head (improves accuracy)
+  serve    Run an HTTP server that exposes the index as /api/falcon
+run any command with --help for its options. The full blueprint is in
+data_label_factory/identify/README.md.
+"""
+def main(argv: list[str] | None = None) -> int:
+    args = list(argv) if argv is not None else sys.argv[1:]
+    if not args or args[0] in ("-h", "--help", "help"):
+        print(HELP)
+        return 0
+    cmd = args[0]
+    rest = args[1:]
+    if cmd == "index":
+        from .build_index import main as _main
+        return _main(rest)
+    if cmd == "verify":
+        from .verify_index import main as _main
+        return _main(rest)
+    if cmd == "train":
+        from .train import main as _main
+        return _main(rest)
+    if cmd == "serve":
+        from .serve import main as _main
+        return _main(rest)
+    print(f"unknown command: {cmd}\n", file=sys.stderr)
+    print(HELP, file=sys.stderr)
+    return 1
+if __name__ == "__main__":
+    sys.exit(main())

data_label_factory/identify/serve.py ADDED Viewed

	@@ -0,0 +1,309 @@

+"""HTTP server that serves a CLIP retrieval index over a mac_tensor-shaped
+/api/falcon endpoint. Compatible with the existing data-label-factory web UI
+(`web/canvas/live`) without any client changes.
+Architecture per request:
+    1. YOLOv8-World detects "card-shaped" regions (open-vocab "card" class)
+    2. Each region is cropped, CLIP-encoded, optionally projection-headed
+    3. Cosine-matched against the loaded index → top match per region
+    4. If YOLO finds nothing, falls back to classifying the center crop
+    5. Returns mac_tensor /api/falcon-shaped JSON so the existing proxy works
+Also serves the reference images at /refs/<filename> so the live tracker UI
+can show "this is what the model thinks you're holding" alongside the webcam.
+Configurable via env vars:
+    CARD_INDEX            path to .npz from `index` (default: card_index.npz)
+    CLIP_PROJ             optional path to projection head .pt (default: clip_proj.pt)
+    REFS_DIR              folder of reference images served at /refs/ (default: limit-over-pack)
+    YOLO_CONF             YOLO confidence threshold (default: 0.05)
+    CLIP_SIM_THRESHOLD    minimum cosine to accept a match (default: 0.70)
+    CLIP_MARGIN_THRESHOLD minimum top1−top2 cosine gap to be 'confident' (default: 0.04)
+    PORT                  HTTP port (default: 8500)
+"""
+from __future__ import annotations
+import argparse
+import io
+import os
+import sys
+import threading
+import time
+import traceback
+from typing import Any
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(
+        prog="data_label_factory.identify serve",
+        description=(
+            "Run a mac_tensor-shaped /api/falcon HTTP server that serves a CLIP "
+            "retrieval index. Compatible with the existing data-label-factory "
+            "web/canvas/live UI without client changes."
+        ),
+    )
+    parser.add_argument("--index", default=os.environ.get("CARD_INDEX", "card_index.npz"),
+                        help="Path to the .npz index built by `index`")
+    parser.add_argument("--projection", default=os.environ.get("CLIP_PROJ", "clip_proj.pt"),
+                        help="Path to the .pt projection head from `train` (optional)")
+    parser.add_argument("--refs", default=os.environ.get("REFS_DIR", "limit-over-pack"),
+                        help="Folder of reference images, served at /refs/")
+    parser.add_argument("--port", type=int, default=int(os.environ.get("PORT", "8500")))
+    parser.add_argument("--host", default="0.0.0.0")
+    parser.add_argument("--sim-threshold", type=float,
+                        default=float(os.environ.get("CLIP_SIM_THRESHOLD", "0.70")))
+    parser.add_argument("--margin-threshold", type=float,
+                        default=float(os.environ.get("CLIP_MARGIN_THRESHOLD", "0.04")))
+    parser.add_argument("--yolo-conf", type=float,
+                        default=float(os.environ.get("YOLO_CONF", "0.05")))
+    parser.add_argument("--no-yolo", action="store_true",
+                        help="Skip YOLO detection entirely; always classify the center crop")
+    args = parser.parse_args(argv)
+    try:
+        import numpy as np
+        import torch
+        import torch.nn as nn
+        import torch.nn.functional as F
+        from PIL import Image
+        from fastapi import FastAPI, UploadFile, File, Form, HTTPException
+        from fastapi.responses import JSONResponse, PlainTextResponse
+        from fastapi.middleware.cors import CORSMiddleware
+        from fastapi.staticfiles import StaticFiles
+        import uvicorn
+        import clip
+    except ImportError as e:
+        raise SystemExit(
+            f"missing dependency: {e}\n"
+            "install with:\n"
+            "    pip install fastapi 'uvicorn[standard]' python-multipart pillow torch "
+            "git+https://github.com/openai/CLIP.git\n"
+            "  (and `pip install ultralytics` if you want YOLO detection)"
+        )
+    DEVICE = ("mps" if torch.backends.mps.is_available()
+              else "cuda" if torch.cuda.is_available() else "cpu")
+    print(f"[serve] device={DEVICE}", flush=True)
+    # ---------- load CLIP + projection head ----------
+    print(f"[serve] loading CLIP ViT-B/32 …", flush=True)
+    clip_model, clip_preprocess = clip.load("ViT-B/32", device=DEVICE)
+    clip_model.eval()
+    proj_head = None
+    if args.projection and os.path.exists(args.projection):
+        class ProjectionHead(nn.Module):
+            def __init__(self, in_dim=512, hidden=512, out_dim=256):
+                super().__init__()
+                self.net = nn.Sequential(
+                    nn.Linear(in_dim, hidden), nn.GELU(), nn.Linear(hidden, out_dim))
+            def forward(self, x):
+                return F.normalize(self.net(x), dim=-1)
+        ckpt = torch.load(args.projection, map_location=DEVICE)
+        sd = ckpt.get("state_dict", ckpt)
+        proj_head = ProjectionHead(
+            in_dim=ckpt.get("in_dim", 512),
+            hidden=ckpt.get("hidden", 512),
+            out_dim=ckpt.get("out_dim", 256),
+        ).to(DEVICE)
+        proj_head.load_state_dict(sd)
+        proj_head.eval()
+        print(f"[serve] loaded fine-tuned projection head from {args.projection}", flush=True)
+    else:
+        print(f"[serve] no projection head — using raw CLIP features", flush=True)
+    # ---------- load index ----------
+    if not os.path.exists(args.index):
+        raise SystemExit(f"index not found: {args.index}\n"
+                         f"build one with: data_label_factory.identify index --refs <folder>")
+    npz = np.load(args.index, allow_pickle=True)
+    CARD_EMB = npz["embeddings"]
+    CARD_NAMES = list(npz["names"])
+    CARD_FILES = list(npz["filenames"]) if "filenames" in npz.files else ["" for _ in CARD_NAMES]
+    print(f"[serve] loaded {len(CARD_NAMES)} refs from {args.index}", flush=True)
+    # ---------- optional YOLO for multi-card detection ----------
+    yolo = None
+    if not args.no_yolo:
+        try:
+            from ultralytics import YOLO
+            print(f"[serve] loading YOLOv8s-world for card detection …", flush=True)
+            yolo = YOLO("yolov8s-world.pt")
+            yolo.set_classes(["card", "trading card", "playing card"])
+            print(f"[serve]   yolo ready (device={yolo.device})", flush=True)
+        except Exception as e:
+            print(f"[serve] YOLO unavailable ({e}); using whole-frame mode only", flush=True)
+    # ---------- helpers ----------
+    RARITY_SUFFIXES = {
+        "pscr": "PScR", "scr": "ScR", "ur": "UR", "sr": "SR",
+        "op": "OP", "utr": "UtR", "cr": "CR", "ea": "EA", "gmr": "GMR",
+    }
+    def _split_name_and_rarity(full: str) -> tuple[str, str]:
+        parts = full.split()
+        if parts and parts[-1].lower() in RARITY_SUFFIXES:
+            return " ".join(parts[:-1]), RARITY_SUFFIXES[parts[-1].lower()]
+        return full, ""
+    def _embed_pil(pil) -> "np.ndarray":
+        with torch.no_grad():
+            t = clip_preprocess(pil).unsqueeze(0).to(DEVICE)
+            f = clip_model.encode_image(t).float()
+            f = f / f.norm(dim=-1, keepdim=True)
+            if proj_head is not None:
+                f = proj_head(f)
+        return f.cpu().numpy()[0].astype(np.float32)
+    def _identify_crop(crop, top_k: int = 3) -> dict:
+        q = _embed_pil(crop)
+        sims = CARD_EMB @ q
+        order = np.argsort(-sims)[:top_k]
+        top = [{
+            "name": CARD_NAMES[i],
+            "filename": CARD_FILES[i] if i < len(CARD_FILES) else "",
+            "score": float(sims[i]),
+        } for i in order]
+        margin = top[0]["score"] - top[1]["score"] if len(top) > 1 else top[0]["score"]
+        return {
+            "top": top,
+            "best_name": top[0]["name"],
+            "best_filename": top[0]["filename"],
+            "best_score": top[0]["score"],
+            "margin": float(margin),
+            "confident": float(margin) >= args.margin_threshold,
+        }
+    # ---------- FastAPI app ----------
+    app = FastAPI(title="data-label-factory identify worker")
+    app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
+    if os.path.isdir(args.refs):
+        app.mount("/refs", StaticFiles(directory=args.refs), name="refs")
+        print(f"[serve] mounted /refs/ from {args.refs}", flush=True)
+    _state = {"requests": 0, "last_query": ""}
+    _lock = threading.Lock()
+    @app.get("/")
+    def root() -> PlainTextResponse:
+        return PlainTextResponse(
+            f"data-label-factory identify · index={len(CARD_NAMES)} refs · "
+            f"requests={_state['requests']} · last_query={_state['last_query']!r}\n"
+            f"POST /api/falcon (multipart: image, query) — mac_tensor-shaped\n"
+            f"GET /refs/<filename> — reference images\n"
+            f"GET /health — JSON status\n"
+        )
+    @app.get("/health")
+    def health() -> dict:
+        return {
+            "phase": "ready",
+            "model_loaded": True,
+            "device": DEVICE,
+            "index_size": len(CARD_NAMES),
+            "has_projection": proj_head is not None,
+            "has_yolo": yolo is not None,
+            "sim_threshold": args.sim_threshold,
+            "margin_threshold": args.margin_threshold,
+            "requests_served": _state["requests"],
+            "last_query": _state["last_query"],
+        }
+    @app.post("/api/falcon")
+    async def falcon(image: UploadFile = File(...), query: str = Form(...)) -> JSONResponse:
+        t0 = time.time()
+        try:
+            pil = Image.open(io.BytesIO(await image.read())).convert("RGB")
+        except Exception as e:
+            raise HTTPException(400, f"bad image: {e}")
+        W, H = pil.size
+        with _lock:
+            _state["last_query"] = query
+        masks: list[dict] = []
+        # 1. YOLO multi-card detection (if available)
+        if yolo is not None:
+            try:
+                results = yolo.predict(pil, conf=args.yolo_conf, iou=0.5, verbose=False)
+                if results:
+                    boxes = getattr(results[0], "boxes", None)
+                    if boxes is not None and boxes.xyxy is not None:
+                        for x1, y1, x2, y2 in boxes.xyxy.cpu().numpy().tolist():
+                            bx1, by1 = max(0, int(x1)), max(0, int(y1))
+                            bx2, by2 = min(W, int(x2)), min(H, int(y2))
+                            if bx2 - bx1 < 20 or by2 - by1 < 20:
+                                continue
+                            crop = pil.crop((bx1, by1, bx2, by2))
+                            info = _identify_crop(crop)
+                            if info["best_score"] < args.sim_threshold:
+                                continue
+                            name, rarity = _split_name_and_rarity(info["best_name"])
+                            display = f"{name} ({rarity})" if rarity else name
+                            if not info["confident"]:
+                                display = f"{display}?"
+                            masks.append({
+                                "bbox_norm": {
+                                    "x1": float(x1) / W, "y1": float(y1) / H,
+                                    "x2": float(x2) / W, "y2": float(y2) / H,
+                                },
+                                "area_fraction": float((x2 - x1) * (y2 - y1)) / max(W * H, 1),
+                                "label": display,
+                                "name": name,
+                                "rarity": rarity,
+                                "score": info["best_score"],
+                                "top_k": info["top"],
+                                "margin": info["margin"],
+                                "confident": info["confident"],
+                                "ref_filename": info["best_filename"],
+                            })
+            except Exception as e:
+                print(f"[serve] yolo error: {e}", flush=True)
+        # 2. Whole-frame fallback (single-card workflow)
+        if not masks:
+            cx1, cy1 = int(W * 0.10), int(H * 0.05)
+            cx2, cy2 = int(W * 0.90), int(H * 0.95)
+            center = pil.crop((cx1, cy1, cx2, cy2))
+            info = _identify_crop(center)
+            if info["best_score"] >= args.sim_threshold and info["confident"]:
+                name, rarity = _split_name_and_rarity(info["best_name"])
+                display = f"{name} ({rarity})" if rarity else name
+                masks.append({
+                    "bbox_norm": {
+                        "x1": cx1 / W, "y1": cy1 / H, "x2": cx2 / W, "y2": cy2 / H,
+                    },
+                    "area_fraction": (cx2 - cx1) * (cy2 - cy1) / max(W * H, 1),
+                    "label": display,
+                    "name": name,
+                    "rarity": rarity,
+                    "score": info["best_score"],
+                    "top_k": info["top"],
+                    "margin": info["margin"],
+                    "confident": True,
+                    "ref_filename": info["best_filename"],
+                })
+        with _lock:
+            _state["requests"] += 1
+        return JSONResponse(content={
+            "image_size": [W, H],
+            "count": len(masks),
+            "masks": masks,
+            "query": query,
+            "elapsed_seconds": round(time.time() - t0, 3),
+        })
+    print(f"\n[serve] listening on http://{args.host}:{args.port}", flush=True)
+    uvicorn.run(app, host=args.host, port=args.port, log_level="warning")
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

data_label_factory/identify/train.py ADDED Viewed

	@@ -0,0 +1,206 @@

+"""Contrastive fine-tune of a small projection head on top of frozen CLIP.
+Wraps the proven training loop that took the 150-card index from cosine
+margin 0.074 → 0.36 (5x improvement). The CLIP backbone stays frozen, only
+a tiny ~400k-param projection MLP is trained, so this runs on Apple Silicon
+MPS in ~5 minutes for a 150-class set.
+Data generation: K cards × M augmentations per batch (default 16 × 4 = 64).
+Loss: SupCon (Khosla et al. 2020).
+"""
+from __future__ import annotations
+import argparse
+import os
+import random
+import sys
+import time
+from pathlib import Path
+# Lazy heavy imports — only triggered when this module is actually invoked.
+DEFAULT_PALETTE_HINT = "ViT-B/32 + 512→512→256 projection"
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(
+        prog="data_label_factory.identify train",
+        description=(
+            "Contrastive fine-tune a small projection head on top of frozen CLIP. "
+            "Use this when off-the-shelf CLIP retrieval is too noisy for your "
+            f"reference set. Architecture: {DEFAULT_PALETTE_HINT}."
+        ),
+    )
+    parser.add_argument("--refs", required=True,
+                        help="Folder of reference images (1 per class). Filenames become labels.")
+    parser.add_argument("--out", default="clip_proj.pt",
+                        help="Output path for the trained projection head .pt")
+    parser.add_argument("--epochs", type=int, default=12)
+    parser.add_argument("--k-cards", type=int, default=16,
+                        help="Distinct classes per training batch.")
+    parser.add_argument("--m-augs", type=int, default=4,
+                        help="Augmentations per class per batch.")
+    parser.add_argument("--steps-per-epoch", type=int, default=80)
+    parser.add_argument("--lr", type=float, default=5e-4)
+    parser.add_argument("--temperature", type=float, default=0.1)
+    parser.add_argument("--clip-model", default="ViT-B/32")
+    args = parser.parse_args(argv)
+    try:
+        import numpy as np
+        import torch
+        import torch.nn as nn
+        import torch.nn.functional as F
+        from torch.utils.data import Dataset, DataLoader, Sampler
+        from torchvision import transforms
+        from PIL import Image
+        import clip
+    except ImportError as e:
+        raise SystemExit(
+            f"missing dependency: {e}\n"
+            "install with:\n"
+            "    pip install torch torchvision pillow git+https://github.com/openai/CLIP.git"
+        )
+    DEVICE = ("mps" if torch.backends.mps.is_available()
+              else "cuda" if torch.cuda.is_available() else "cpu")
+    print(f"[train] device={DEVICE}", flush=True)
+    refs = Path(args.refs)
+    if not refs.is_dir():
+        raise SystemExit(f"refs folder not found: {refs}")
+    print(f"[train] loading CLIP {args.clip_model} …", flush=True)
+    clip_model, clip_preprocess = clip.load(args.clip_model, device=DEVICE)
+    clip_model.eval()
+    for p in clip_model.parameters():
+        p.requires_grad = False
+    class CardDataset(Dataset):
+        def __init__(self, folder: Path, augs_per_card: int):
+            files = sorted(f for f in os.listdir(folder)
+                           if f.lower().endswith((".jpg", ".jpeg", ".png", ".webp")))
+            if not files:
+                raise SystemExit(f"no images in {folder}")
+            self.images = []
+            for f in files:
+                self.images.append(Image.open(folder / f).convert("RGB"))
+            self.aug_per_card = augs_per_card
+            self.aug = transforms.Compose([
+                transforms.RandomResizedCrop(256, scale=(0.6, 1.0), ratio=(0.7, 1.4)),
+                transforms.RandomRotation(20),
+                transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.05),
+                transforms.RandomPerspective(distortion_scale=0.2, p=0.5),
+                transforms.RandomApply([transforms.GaussianBlur(5, sigma=(0.1, 2.0))], p=0.3),
+                transforms.RandomGrayscale(p=0.05),
+            ])
+        def __len__(self):
+            return len(self.images) * self.aug_per_card
+        def __getitem__(self, idx):
+            card_idx = idx % len(self.images)
+            return clip_preprocess(self.aug(self.images[card_idx])), card_idx
+    class KCardsSampler(Sampler):
+        def __init__(self, dataset, k_cards: int, m_augs: int, steps: int):
+            self.n = len(dataset.images)
+            self.k = k_cards
+            self.m = m_augs
+            self.steps = steps
+        def __iter__(self):
+            for _ in range(self.steps):
+                cards = random.sample(range(self.n), self.k)
+                batch = []
+                for c in cards:
+                    for _ in range(self.m):
+                        batch.append(c)
+                random.shuffle(batch)
+                yield from batch
+        def __len__(self):
+            return self.steps * self.k * self.m
+    class ProjectionHead(nn.Module):
+        def __init__(self, in_dim=512, hidden=512, out_dim=256):
+            super().__init__()
+            self.net = nn.Sequential(
+                nn.Linear(in_dim, hidden), nn.GELU(), nn.Linear(hidden, out_dim))
+        def forward(self, x):
+            return F.normalize(self.net(x), dim=-1)
+    def supcon_loss(features: "torch.Tensor", labels: "torch.Tensor", temperature: float) -> "torch.Tensor":
+        device = features.device
+        bsz = features.size(0)
+        labels = labels.contiguous().view(-1, 1)
+        mask = torch.eq(labels, labels.T).float().to(device)
+        sim = torch.matmul(features, features.T) / temperature
+        sim_max, _ = torch.max(sim, dim=1, keepdim=True)
+        logits = sim - sim_max.detach()
+        self_mask = torch.scatter(
+            torch.ones_like(mask), 1,
+            torch.arange(bsz, device=device).view(-1, 1), 0)
+        pos_mask = mask * self_mask
+        exp_logits = torch.exp(logits) * self_mask
+        log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True) + 1e-12)
+        pos_count = pos_mask.sum(1)
+        pos_count = torch.where(pos_count == 0, torch.ones_like(pos_count), pos_count)
+        return -((pos_mask * log_prob).sum(1) / pos_count).mean()
+    print(f"[train] dataset from {refs}", flush=True)
+    ds = CardDataset(refs, augs_per_card=args.m_augs)
+    print(f"[train]   {len(ds.images)} reference images", flush=True)
+    sampler = KCardsSampler(ds, k_cards=args.k_cards, m_augs=args.m_augs,
+                            steps=args.steps_per_epoch)
+    loader = DataLoader(ds, batch_size=args.k_cards * args.m_augs,
+                        sampler=sampler, num_workers=0, drop_last=True)
+    head = ProjectionHead(in_dim=512, hidden=512, out_dim=256).to(DEVICE)
+    print(f"[train] projection head: {sum(p.numel() for p in head.parameters()):,} params", flush=True)
+    optimizer = torch.optim.AdamW(head.parameters(), lr=args.lr, weight_decay=1e-4)
+    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
+        optimizer, T_max=args.epochs * args.steps_per_epoch)
+    print(f"\n[train] {args.epochs} epochs · {args.steps_per_epoch} steps · "
+          f"batch={args.k_cards * args.m_augs} (K={args.k_cards}×M={args.m_augs})\n", flush=True)
+    t0 = time.time()
+    for epoch in range(args.epochs):
+        head.train()
+        epoch_loss, n_batches = 0.0, 0
+        for imgs, labels in loader:
+            imgs = imgs.to(DEVICE)
+            labels = labels.to(DEVICE)
+            with torch.no_grad():
+                feats = clip_model.encode_image(imgs).float()
+                feats = feats / feats.norm(dim=-1, keepdim=True)
+            proj = head(feats)
+            loss = supcon_loss(proj, labels, temperature=args.temperature)
+            optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+            scheduler.step()
+            epoch_loss += loss.item()
+            n_batches += 1
+        print(f"[train]   epoch {epoch + 1:2d}/{args.epochs}  loss={epoch_loss / max(n_batches, 1):.4f}  "
+              f"({time.time() - t0:.0f}s)", flush=True)
+    out_path = Path(args.out)
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    torch.save({
+        "state_dict": head.state_dict(),
+        "in_dim": 512, "hidden": 512, "out_dim": 256,
+        "model": args.clip_model,
+        "epochs": args.epochs, "k_cards": args.k_cards, "m_augs": args.m_augs,
+        "ref_count": len(ds.images),
+    }, out_path)
+    print(f"\n[train] ✓ saved {out_path}  ({out_path.stat().st_size / 1024:.0f} KB)", flush=True)
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

data_label_factory/identify/verify_index.py ADDED Viewed

	@@ -0,0 +1,102 @@

+"""Self-test the index for top-1 accuracy + report confusable pairs.
+For each reference image, embed it and verify that its top-1 match in the
+index is itself. Reports the cosine margin between correct and best-wrong
+matches — the most useful number for predicting live accuracy.
+Run this immediately after building an index to catch bad data BEFORE
+deploying it to a live tracker.
+"""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(
+        prog="data_label_factory.identify verify",
+        description=(
+            "Self-test a built index. Each reference image should match itself "
+            "as top-1; a wide cosine margin between correct and best-wrong matches "
+            "is the strongest predictor of live accuracy."
+        ),
+    )
+    parser.add_argument("--index", default="card_index.npz", help="Path to .npz from `index`")
+    parser.add_argument("--top-confusables", type=int, default=5,
+                        help="How many of the most-confusable pairs to print")
+    args = parser.parse_args(argv)
+    try:
+        import numpy as np
+    except ImportError:
+        raise SystemExit("numpy required: pip install numpy")
+    npz = np.load(args.index, allow_pickle=True)
+    EMB = npz["embeddings"]
+    NAMES = list(npz["names"])
+    print(f"[verify] index: {len(NAMES)} refs × {EMB.shape[1]} dims")
+    # Pairwise similarity matrix (small N, fits in memory)
+    sims = EMB @ EMB.T
+    np.fill_diagonal(sims, -1.0)
+    # Top confusable pairs
+    print(f"\nMost-confusable pairs (highest cosine sim between DIFFERENT refs):")
+    flat_idx = np.argpartition(sims.flatten(), -args.top_confusables * 2)[-args.top_confusables * 2:]
+    seen = set()
+    shown = 0
+    for fi in flat_idx[np.argsort(sims.flatten()[flat_idx])[::-1]]:
+        i, j = divmod(int(fi), len(NAMES))
+        if (j, i) in seen:
+            continue
+        seen.add((i, j))
+        print(f"  {sims[i, j]:.3f}  {NAMES[i][:42]}  ↔  {NAMES[j][:42]}")
+        shown += 1
+        if shown >= args.top_confusables:
+            break
+    # Restore diagonal for self-test
+    np.fill_diagonal(sims, 1.0)
+    # Self-identity test: each ref's top-1 in EMB @ EMB[i] should be i
+    correct = 0
+    failures = []
+    for i in range(len(NAMES)):
+        row = EMB @ EMB[i]
+        top = int(np.argmax(row))
+        if top == i:
+            correct += 1
+        else:
+            failures.append((NAMES[i], NAMES[top], float(row[top]), float(row[i])))
+    pct = correct / len(NAMES) * 100
+    print(f"\nself-identity test: {correct}/{len(NAMES)} = {pct:.1f}% top-1 self-id")
+    for name, mismatch, score_wrong, score_right in failures[:10]:
+        print(f"  ✗ {name[:42]}  →  matched {mismatch[:42]}  "
+              f"(top={score_wrong:.3f} vs self={score_right:.3f})")
+    # Margin analysis: gap between "I matched myself" and "best wrong match"
+    correct_scores, best_wrong_scores = [], []
+    for i in range(len(NAMES)):
+        row = EMB @ EMB[i]
+        correct_scores.append(row[i])
+        row[i] = -1
+        best_wrong_scores.append(row.max())
+    median_correct = float(np.median(correct_scores))
+    median_wrong = float(np.median(best_wrong_scores))
+    margin = median_correct - median_wrong
+    print(f"\nthreshold analysis:")
+    print(f"  median correct match score:    {median_correct:.3f}")
+    print(f"  median best-wrong-match score: {median_wrong:.3f}")
+    print(f"  gap (margin):                  {margin:.3f}")
+    suggested = max(0.5, median_wrong + 0.05)
+    print(f"  → recommended SIM_THRESHOLD = {suggested:.2f}")
+    return 0 if pct >= 99 else 1
+if __name__ == "__main__":
+    sys.exit(main())

data_label_factory/runpod/pod_falcon_server.py ADDED Viewed

	@@ -0,0 +1,411 @@

+#!/usr/bin/env python3
+"""
+pod_falcon_server.py — single-file Falcon Perception HTTP server for a RunPod pod.
+Designed to be curl-installed via the pod's dockerStartCmd. Two phases:
+1. **Boot phase (instant):** start a FastAPI server on 0.0.0.0:8000 with two
+   endpoints: `/health` (always responds) and `/api/falcon` (returns 503 until
+   the model is loaded). A background thread starts heavy installation.
+2. **Install phase (~5-10 min):** install pip deps, install falcon-perception
+   with --no-deps, download the Falcon model from Hugging Face, instantiate
+   the inference engine. As soon as it's ready, `/api/falcon` flips to live.
+The endpoint shape MATCHES mac_tensor's /api/falcon so the existing
+`web/app/api/falcon-frame/route.ts` proxy works against it without changes:
+  Request:  multipart/form-data with `image` (file) + `query` (string)
+  Response: {
+              "image_size": [w, h],
+              "count": int,
+              "masks": [{"bbox_norm": {x1, y1, x2, y2}, "area_fraction": float}, ...],
+              "elapsed_seconds": float,
+              "cold_start": bool
+            }
+You can poll progress via:
+  curl https://<pod-id>-8000.proxy.runpod.net/health
+"""
+import io
+import os
+import subprocess
+import sys
+import threading
+import time
+import traceback
+from typing import Any
+# ============================================================
+# Boot phase — keep imports minimal so the server starts FAST
+# ============================================================
+print("[server] starting boot phase…", flush=True)
+BOOT_T0 = time.time()
+# Install fastapi + uvicorn synchronously since we need them for the boot server.
+# These are tiny (~30 MB) so this takes ~10 seconds.
+def _pip(args, retries=3):
+    for attempt in range(retries):
+        r = subprocess.run(
+            [sys.executable, "-m", "pip", "install", "--quiet", "--no-cache-dir"] + args,
+            capture_output=True, text=True,
+        )
+        if r.returncode == 0:
+            return True
+        print(f"[pip] attempt {attempt+1} failed: {r.stderr[:300]}", flush=True)
+        time.sleep(3)
+    return False
+print("[server] installing fastapi + uvicorn + multipart…", flush=True)
+if not _pip(["fastapi==0.115.6", "uvicorn[standard]==0.32.1", "python-multipart==0.0.20", "pillow"]):
+    print("[server] CRITICAL: failed to install fastapi", flush=True)
+    sys.exit(1)
+# Now we can import fastapi
+from fastapi import FastAPI, Request, UploadFile, File, Form, HTTPException
+from fastapi.responses import JSONResponse, PlainTextResponse
+from PIL import Image
+app = FastAPI(title="data-label-factory falcon worker")
+STATE: dict[str, Any] = {
+    "phase":          "boot",
+    "boot_started":   BOOT_T0,
+    "model_loaded":   False,
+    "install_log":    [],
+    "error":          None,
+    "model_id":       None,
+    "device":         None,
+    "load_seconds":   None,
+    "cold_start_used": False,
+    "requests_served": 0,
+}
+def _log(msg: str) -> None:
+    line = f"[{time.time() - BOOT_T0:6.1f}s] {msg}"
+    print(line, flush=True)
+    STATE["install_log"].append(line)
+    # cap log to last 200 lines
+    if len(STATE["install_log"]) > 200:
+        STATE["install_log"] = STATE["install_log"][-200:]
+# ============================================================
+# Endpoints
+# ============================================================
+@app.get("/")
+def root() -> PlainTextResponse:
+    return PlainTextResponse(
+        f"data-label-factory falcon worker · phase={STATE['phase']} "
+        f"loaded={STATE['model_loaded']} requests={STATE['requests_served']}\n"
+        f"see /health for full status, POST /api/falcon for inference\n"
+    )
+@app.get("/health")
+def health() -> dict:
+    return {
+        "phase":          STATE["phase"],
+        "model_loaded":   STATE["model_loaded"],
+        "model_id":       STATE.get("model_id"),
+        "device":         STATE.get("device"),
+        "load_seconds":   STATE.get("load_seconds"),
+        "uptime_seconds": round(time.time() - BOOT_T0, 1),
+        "requests_served": STATE["requests_served"],
+        "error":          STATE["error"],
+        "recent_log":     STATE["install_log"][-30:],
+    }
+@app.post("/api/falcon")
+async def falcon(image: UploadFile = File(...), query: str = Form(...)) -> JSONResponse:
+    if not STATE["model_loaded"]:
+        return JSONResponse(
+            status_code=503,
+            content={
+                "error":   "model not loaded yet",
+                "phase":   STATE["phase"],
+                "loaded":  False,
+                "uptime":  round(time.time() - BOOT_T0, 1),
+                "recent":  STATE["install_log"][-5:],
+            },
+        )
+    t0 = time.time()
+    img_bytes = await image.read()
+    try:
+        pil = Image.open(io.BytesIO(img_bytes)).convert("RGB")
+    except Exception as e:
+        return JSONResponse(status_code=400, content={"error": f"bad image: {e}"})
+    cold = not STATE["cold_start_used"]
+    STATE["cold_start_used"] = True
+    try:
+        result = _run_inference(pil, query)
+    except Exception as e:
+        return JSONResponse(
+            status_code=500,
+            content={"error": str(e), "trace": traceback.format_exc().splitlines()[-6:]},
+        )
+    STATE["requests_served"] += 1
+    return JSONResponse(content={
+        "image_size":      [pil.width, pil.height],
+        "count":           result["count"],
+        "masks":           result["masks"],
+        "query":           query,
+        "elapsed_seconds": round(time.time() - t0, 3),
+        "cold_start":      cold,
+    })
+# ============================================================
+# Heavy install + inference (loaded in background thread)
+# ============================================================
+_engine = None
+_tokenizer = None
+_image_processor = None
+_model = None
+_model_args = None
+_sampling_params = None
+_torch = None  # cached torch module reference
+def _run_inference(pil_img: "Image.Image", query: str) -> dict:
+    """Single-image Falcon Perception forward pass.
+    Uses task='segmentation' per the prior session learning ('detection mode
+    returns empty bboxes'). Extracts bboxes from each segmentation mask via
+    pycocotools mask decoding.
+    """
+    if _engine is None:
+        raise RuntimeError("model not loaded")
+    from falcon_perception import build_prompt_for_task  # type: ignore
+    from falcon_perception.paged_inference import Sequence  # type: ignore
+    W, H = pil_img.size
+    task = "segmentation" if getattr(_model_args, "do_segmentation", False) else "detection"
+    prompt = build_prompt_for_task(query, task)
+    sequences = [Sequence(
+        text=prompt,
+        image=pil_img,
+        min_image_size=256,
+        max_image_size=1024,
+        task=task,
+    )]
+    with _torch.inference_mode():
+        _engine.generate(
+            sequences,
+            sampling_params=_sampling_params,
+            use_tqdm=False,
+            print_stats=False,
+        )
+    seq = sequences[0]
+    aux = seq.output_aux
+    masks_out: list[dict] = []
+    # Path A: detection mode (bboxes_raw is populated)
+    bboxes_raw = getattr(aux, "bboxes_raw", None)
+    if bboxes_raw:
+        try:
+            from falcon_perception.visualization_utils import pair_bbox_entries  # type: ignore
+            pairs = pair_bbox_entries(bboxes_raw)
+            for entry in pairs:
+                if hasattr(entry, "_asdict"):
+                    d = entry._asdict()
+                elif isinstance(entry, dict):
+                    d = entry
+                else:
+                    vals = list(entry)
+                    if len(vals) < 5:
+                        continue
+                    d = {"x1": vals[1], "y1": vals[2], "x2": vals[3], "y2": vals[4]}
+                x1 = float(d.get("x1", 0)); y1 = float(d.get("y1", 0))
+                x2 = float(d.get("x2", 0)); y2 = float(d.get("y2", 0))
+                masks_out.append({
+                    "bbox_norm": {
+                        "x1": x1 / W if x1 > 1.5 else x1,
+                        "y1": y1 / H if y1 > 1.5 else y1,
+                        "x2": x2 / W if x2 > 1.5 else x2,
+                        "y2": y2 / H if y2 > 1.5 else y2,
+                    },
+                    "area_fraction": ((x2 - x1) * (y2 - y1)) / (W * H) if W and H else 0.0,
+                })
+        except Exception as e:
+            _log(f"pair_bbox_entries failed: {e}")
+    # Path B: segmentation mode (masks_rle is populated)
+    if not masks_out:
+        masks_rle = getattr(aux, "masks_rle", None) or []
+        for m in masks_rle:
+            try:
+                # Try to extract a bbox from the mask. Multiple possible shapes.
+                if isinstance(m, dict) and "bbox" in m:
+                    bb = m["bbox"]  # could be [x,y,w,h] or [x1,y1,x2,y2]
+                    if len(bb) == 4:
+                        x1, y1 = float(bb[0]), float(bb[1])
+                        # Heuristic: if last two are smaller than first two, treat as w/h
+                        if bb[2] < bb[0] or bb[3] < bb[1]:
+                            x2, y2 = x1 + float(bb[2]), y1 + float(bb[3])
+                        else:
+                            x2, y2 = float(bb[2]), float(bb[3])
+                        masks_out.append({
+                            "bbox_norm": {
+                                "x1": x1 / W if x1 > 1.5 else x1,
+                                "y1": y1 / H if y1 > 1.5 else y1,
+                                "x2": x2 / W if x2 > 1.5 else x2,
+                                "y2": y2 / H if y2 > 1.5 else y2,
+                            },
+                            "area_fraction": float(m.get("area", (x2 - x1) * (y2 - y1))) / max(W * H, 1),
+                        })
+                        continue
+                # Fall back to decoding the RLE mask via pycocotools
+                from pycocotools import mask as maskUtils  # type: ignore
+                import numpy as np  # type: ignore
+                rle = m if isinstance(m, dict) else {"counts": m, "size": [H, W]}
+                if "size" not in rle:
+                    rle["size"] = [H, W]
+                if isinstance(rle.get("counts"), str):
+                    rle["counts"] = rle["counts"].encode()
+                decoded = maskUtils.decode(rle)
+                if decoded is None or decoded.size == 0:
+                    continue
+                ys, xs = np.where(decoded > 0)
+                if xs.size == 0:
+                    continue
+                x1, y1 = int(xs.min()), int(ys.min())
+                x2, y2 = int(xs.max()), int(ys.max())
+                masks_out.append({
+                    "bbox_norm": {"x1": x1 / W, "y1": y1 / H, "x2": x2 / W, "y2": y2 / H},
+                    "area_fraction": float(decoded.sum()) / max(W * H, 1),
+                })
+            except Exception as e:
+                _log(f"mask parse failed: {e}")
+    return {"count": len(masks_out), "masks": masks_out}
+def _heavy_install_and_load() -> None:
+    """Background thread: install heavy deps, download model, load inference engine."""
+    global _engine, _tokenizer, _image_processor, _model, _model_args, _sampling_params, _torch
+    try:
+        STATE["phase"] = "installing pip"
+        _log("installing transformers + qwen-vl-utils + accelerate + safetensors …")
+        if not _pip([
+            "transformers>=4.49.0,<5",
+            "qwen-vl-utils>=0.0.10",
+            "accelerate>=0.34",
+            "safetensors>=0.4",
+            "einops>=0.8.0",
+            "opencv-python>=4.10.0",
+            "scipy>=1.13.0",
+            "pycocotools>=2.0.7",
+            "tyro>=0.8.0",
+            "huggingface_hub>=0.26",
+            "numpy<2",  # falcon-perception is happier with numpy 1.x
+        ]):
+            raise RuntimeError("pip install of heavy deps failed")
+        STATE["phase"] = "installing falcon-perception"
+        _log("installing falcon-perception (--no-deps to preserve base torch)…")
+        if not _pip(["--no-deps", "falcon-perception"]):
+            raise RuntimeError("pip install of falcon-perception failed")
+        STATE["phase"] = "loading model"
+        _log("importing torch + falcon_perception …")
+        import torch as _t  # type: ignore
+        _torch = _t
+        from falcon_perception import (  # type: ignore
+            PERCEPTION_MODEL_ID,
+            build_prompt_for_task,
+            load_and_prepare_model,
+            setup_torch_config,
+        )
+        from falcon_perception.data import ImageProcessor  # type: ignore
+        from falcon_perception.paged_inference import (  # type: ignore
+            PagedInferenceEngine,
+            SamplingParams,
+            Sequence,
+        )
+        STATE["model_id"] = PERCEPTION_MODEL_ID
+        _log(f"model id: {PERCEPTION_MODEL_ID}")
+        _log("setting up torch …")
+        setup_torch_config()
+        _log("loading model + processor (downloads ~600 MB on first run, may take 2-5 min)…")
+        load_t0 = time.time()
+        _model, _tokenizer, _model_args = load_and_prepare_model(
+            hf_model_id=PERCEPTION_MODEL_ID,
+            hf_revision="main",
+            hf_local_dir=None,
+            device=None,         # let model pick CUDA
+            dtype="bfloat16",
+            compile=False,       # skip torch.compile to keep load fast (~30s vs 60s+)
+        )
+        _log("instantiating ImageProcessor + PagedInferenceEngine…")
+        _image_processor = ImageProcessor(patch_size=16, merge_size=1)
+        _engine = PagedInferenceEngine(
+            _model, _tokenizer, _image_processor,
+            max_batch_size=1,
+            max_seq_length=8192,
+            n_pages=128,
+            page_size=128,
+            prefill_length_limit=8192,
+            enable_hr_cache=False,
+            capture_cudagraph=False,
+        )
+        _sampling_params = SamplingParams(
+            stop_token_ids=[_tokenizer.eos_token_id, _tokenizer.end_of_query_token_id],
+        )
+        STATE["load_seconds"] = round(time.time() - load_t0, 1)
+        STATE["device"] = "cuda" if _torch.cuda.is_available() else "cpu"
+        # Quick warmup so the first real request isn't 30s slower than steady state
+        _log("warmup pass on a dummy image…")
+        warmup_img = Image.new("RGB", (256, 256), color=(128, 128, 128))
+        warmup_seqs = [Sequence(
+            text=build_prompt_for_task("anything", "detection"),
+            image=warmup_img,
+            min_image_size=256,
+            max_image_size=512,
+            task="detection",
+        )]
+        with _torch.inference_mode():
+            _engine.generate(warmup_seqs, sampling_params=_sampling_params,
+                             use_tqdm=False, print_stats=False)
+        STATE["phase"] = "ready"
+        STATE["model_loaded"] = True
+        _log(f"✓ READY in {time.time() - BOOT_T0:.1f}s total")
+    except Exception as e:
+        STATE["phase"] = "FAILED"
+        STATE["error"] = str(e)
+        _log(f"FATAL: {e}")
+        _log(traceback.format_exc())
+# Kick off the install thread now (server hasn't started yet but the import is done)
+threading.Thread(target=_heavy_install_and_load, daemon=True).start()
+# ============================================================
+# Run the server
+# ============================================================
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.environ.get("PORT", "8000"))
+    _log(f"booting uvicorn on 0.0.0.0:{port}")
+    uvicorn.run(app, host="0.0.0.0", port=port, log_level="info")

pyproject.toml CHANGED Viewed

@@ -57,6 +57,18 @@ runpod = [
     "datasets>=3.0",
     "pyarrow>=17.0",
 ]
 dev = [
     "pytest>=7.0",
     "ruff>=0.5.0",
@@ -73,8 +85,13 @@ data_label_factory = "data_label_factory.cli:main"
 data-label-factory = "data_label_factory.cli:main"
 [tool.setuptools]
-packages = ["data_label_factory", "data_label_factory.runpod"]
 [tool.setuptools.package-data]
 data_label_factory = ["*.py"]
 "data_label_factory.runpod" = ["*.py", "*.md", "Dockerfile", "requirements-pod.txt"]

     "datasets>=3.0",
     "pyarrow>=17.0",
 ]
+identify = [
+    # Open-set CLIP retrieval: build/verify/train/serve a card-style index
+    "torch>=2.1",
+    "torchvision>=0.16",
+    "numpy>=1.24,<2",
+    "fastapi>=0.115",
+    "uvicorn[standard]>=0.32",
+    "python-multipart>=0.0.20",
+    "pillow>=10.0",
+    "ultralytics>=8.3",
+    "clip @ git+https://github.com/openai/CLIP.git",
+]
 dev = [
     "pytest>=7.0",
     "ruff>=0.5.0",
 data-label-factory = "data_label_factory.cli:main"
 [tool.setuptools]
+packages = [
+    "data_label_factory",
+    "data_label_factory.runpod",
+    "data_label_factory.identify",
+]
 [tool.setuptools.package-data]
 data_label_factory = ["*.py"]
 "data_label_factory.runpod" = ["*.py", "*.md", "Dockerfile", "requirements-pod.txt"]
+"data_label_factory.identify" = ["*.py", "*.md"]

web/app/api/falcon-frame/route.ts CHANGED Viewed

@@ -24,6 +24,9 @@ type Bbox = {
     y2: number;
     score: number;
     label: string;
 };
 const FALCON_URL = process.env.FALCON_URL ?? "http://localhost:8500/api/falcon";
@@ -96,17 +99,30 @@ export async function POST(req: NextRequest) {
             upstreamCount = data.count ?? 0;
             imgW = data.image_size?.[0] ?? data.width ?? 0;
             imgH = data.image_size?.[1] ?? data.height ?? 0;
-            // mac_tensor returns masks: [{bbox_norm: {x1,y1,x2,y2}, slot, area_fraction}]
             for (const m of data.masks ?? []) {
                 const bn = m.bbox_norm ?? {};
                 if (bn.x1 == null) continue;
                 bboxes.push({
                     x1: bn.x1,
                     y1: bn.y1,
                     x2: bn.x2,
                     y2: bn.y2,
-                    score: m.area_fraction ?? 1,
-                    label: query,
                 });
             }
         }

     y2: number;
     score: number;
     label: string;
+    ref_url?: string;   // URL to a reference image (for the live tracker sidebar)
+    margin?: number;
+    confident?: boolean;
 };
 const FALCON_URL = process.env.FALCON_URL ?? "http://localhost:8500/api/falcon";
             upstreamCount = data.count ?? 0;
             imgW = data.image_size?.[0] ?? data.width ?? 0;
             imgH = data.image_size?.[1] ?? data.height ?? 0;
+            // mac_tensor returns masks: [{bbox_norm:{x1,y1,x2,y2}, area_fraction, label?, score?, ref_filename?}]
+            // The label/score/ref_filename are present in identify-mode (CLIP retrieval).
+            // Construct an absolute ref_url from the upstream base + filename so the
+            // browser can render the reference card image directly without an extra
+            // proxy hop.
+            const upstreamBase = new URL(FALCON_URL);
+            upstreamBase.pathname = "/refs/";
             for (const m of data.masks ?? []) {
                 const bn = m.bbox_norm ?? {};
                 if (bn.x1 == null) continue;
+                let ref_url: string | undefined = undefined;
+                if (typeof m.ref_filename === "string" && m.ref_filename) {
+                    ref_url = upstreamBase.toString() + m.ref_filename;
+                }
                 bboxes.push({
                     x1: bn.x1,
                     y1: bn.y1,
                     x2: bn.x2,
                     y2: bn.y2,
+                    score: typeof m.score === "number" ? m.score : (m.area_fraction ?? 1),
+                    label: typeof m.label === "string" && m.label ? m.label : query,
+                    ref_url,
+                    margin: typeof m.margin === "number" ? m.margin : undefined,
+                    confident: typeof m.confident === "boolean" ? m.confident : undefined,
                 });
             }
         }

web/app/canvas/live/page.tsx CHANGED Viewed

@@ -25,7 +25,14 @@ type SourceMode = "idle" | "file" | "webcam";
 type FalconResponse = {
     ok: boolean;
     count?: number;
-    bboxes?: Array<{ x1: number; y1: number; x2: number; y2: number; score: number; label: string }>;
     image_size?: { w: number; h: number };
     elapsed_ms?: number;
     upstream?: string;
@@ -43,6 +50,11 @@ export default function LiveTrackerPage() {
     const streamRef = useRef<MediaStream | null>(null);
     const objectUrlRef = useRef<string | null>(null);
     const [mode, setMode] = useState<SourceMode>("idle");
     const [query, setQuery] = useState<string>("fiber optic drone");
     const [activeTracks, setActiveTracks] = useState<Track[]>([]);
@@ -98,7 +110,7 @@ export default function LiveTrackerPage() {
         const form = new FormData();
         form.set("image", blob, "frame.jpg");
-        form.set("query", query);
         let resp: FalconResponse;
         try {
@@ -133,6 +145,7 @@ export default function LiveTrackerPage() {
                 y2: isNormalized ? b.y2 * H : b.y2,
                 score: b.score,
                 label: b.label,
             };
         });
@@ -347,7 +360,10 @@ export default function LiveTrackerPage() {
                                 <input
                                     type="text"
                                     value={query}
-                                    onChange={(e) => setQuery(e.target.value)}
                                     className="px-3 py-1.5 rounded-md bg-zinc-800 border border-zinc-700 text-zinc-100 text-sm w-64 focus:outline-none focus:ring-2 focus:ring-cyan-500"
                                     placeholder="e.g. fiber optic drone"
                                 />
@@ -394,19 +410,47 @@ export default function LiveTrackerPage() {
                         {activeTracks.length === 0 ? (
                             <div className="text-sm text-zinc-500">none yet</div>
                         ) : (
-                            <div className="space-y-2">
                                 {activeTracks.map((t) => (
-                                    <div key={t.id} className="flex items-center justify-between text-sm">
-                                        <div className="flex items-center gap-2 min-w-0">
-                                            <span
-                                                className="h-3 w-3 rounded-sm border border-zinc-600 flex-shrink-0"
-                                                style={{ backgroundColor: t.color }}
-                                            />
-                                            <span className="text-zinc-100 truncate">#{t.id} {t.label}</span>
                                         </div>
-                                        <span className="text-zinc-400 text-xs whitespace-nowrap ml-2">
-                                            {t.hits}/{t.age}f
-                                        </span>
                                     </div>
                                 ))}
                             </div>

 type FalconResponse = {
     ok: boolean;
     count?: number;
+    bboxes?: Array<{
+        x1: number; y1: number; x2: number; y2: number;
+        score: number;
+        label: string;
+        ref_url?: string;
+        margin?: number;
+        confident?: boolean;
+    }>;
     image_size?: { w: number; h: number };
     elapsed_ms?: number;
     upstream?: string;
     const streamRef = useRef<MediaStream | null>(null);
     const objectUrlRef = useRef<string | null>(null);
+    // Live query ref — read from inside the sendNextFrame loop instead of
+    // closure-captured `query` to avoid stale-closure bugs when the user
+    // types a new query mid-stream.
+    const queryRef = useRef<string>("fiber optic drone");
     const [mode, setMode] = useState<SourceMode>("idle");
     const [query, setQuery] = useState<string>("fiber optic drone");
     const [activeTracks, setActiveTracks] = useState<Track[]>([]);
         const form = new FormData();
         form.set("image", blob, "frame.jpg");
+        form.set("query", queryRef.current);
         let resp: FalconResponse;
         try {
                 y2: isNormalized ? b.y2 * H : b.y2,
                 score: b.score,
                 label: b.label,
+                ref_url: b.ref_url,
             };
         });
                                 <input
                                     type="text"
                                     value={query}
+                                    onChange={(e) => {
+                                        setQuery(e.target.value);
+                                        queryRef.current = e.target.value;
+                                    }}
                                     className="px-3 py-1.5 rounded-md bg-zinc-800 border border-zinc-700 text-zinc-100 text-sm w-64 focus:outline-none focus:ring-2 focus:ring-cyan-500"
                                     placeholder="e.g. fiber optic drone"
                                 />
                         {activeTracks.length === 0 ? (
                             <div className="text-sm text-zinc-500">none yet</div>
                         ) : (
+                            <div className="space-y-3">
                                 {activeTracks.map((t) => (
+                                    <div
+                                        key={t.id}
+                                        className="rounded-md border border-zinc-800 bg-zinc-950 p-2"
+                                    >
+                                        <div className="flex items-start gap-3">
+                                            {/* Reference card image (if backend provided one) */}
+                                            {t.ref_url ? (
+                                                /* eslint-disable-next-line @next/next/no-img-element */
+                                                <img
+                                                    src={t.ref_url}
+                                                    alt={t.label}
+                                                    className="w-16 h-auto rounded border-2 flex-shrink-0"
+                                                    style={{ borderColor: t.color }}
+                                                />
+                                            ) : (
+                                                <div
+                                                    className="w-16 h-22 rounded border-2 flex-shrink-0 flex items-center justify-center text-xs text-zinc-600"
+                                                    style={{ borderColor: t.color }}
+                                                >
+                                                    no ref
+                                                </div>
+                                            )}
+                                            <div className="flex-1 min-w-0">
+                                                <div className="flex items-center gap-1.5">
+                                                    <span
+                                                        className="h-2 w-2 rounded-sm flex-shrink-0"
+                                                        style={{ backgroundColor: t.color }}
+                                                    />
+                                                    <span className="text-xs text-zinc-500 font-mono">#{t.id}</span>
+                                                </div>
+                                                <div className="text-sm text-zinc-100 leading-tight mt-1 break-words">
+                                                    {t.label}
+                                                </div>
+                                                <div className="text-xs text-zinc-500 mt-1.5 font-mono">
+                                                    {typeof t.score === "number" ? `score ${t.score.toFixed(2)} · ` : ""}
+                                                    seen {t.hits}/{t.age}f
+                                                </div>
+                                            </div>
                                         </div>
                                     </div>
                                 ))}
                             </div>

web/lib/iou-tracker.ts CHANGED Viewed

@@ -12,6 +12,7 @@ export type Detection = {
     y2: number;
     score?: number;
     label?: string;
 };
 export type Track = Detection & {
@@ -103,6 +104,7 @@ export class IoUTracker {
             t.x1 = d.x1; t.y1 = d.y1; t.x2 = d.x2; t.y2 = d.y2;
             t.score = d.score ?? t.score;
             t.label = d.label ?? t.label;
             t.age += 1;
             t.hits += 1;
             t.framesSinceSeen = 0;

     y2: number;
     score?: number;
     label?: string;
+    ref_url?: string;
 };
 export type Track = Detection & {
             t.x1 = d.x1; t.y1 = d.y1; t.x2 = d.x2; t.y2 = d.y2;
             t.score = d.score ?? t.score;
             t.label = d.label ?? t.label;
+            t.ref_url = d.ref_url ?? t.ref_url;
             t.age += 1;
             t.hits += 1;
             t.framesSinceSeen = 0;