checkbox-detector
A YOLO12n model that detects checked and unchecked checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency.
Quick start
import natural_pdf as npdf
pdf = npdf.PDF("form.pdf")
checkboxes = pdf.pages[0].detect_checkboxes()
for cb in checkboxes:
print(cb.is_checked, cb.confidence, cb.bbox)
The model downloads automatically via huggingface_hub.
Model details
| Architecture | YOLO12n (Ultralytics) |
| Format | ONNX (opset 18, onnxslim) |
| Input | 1024 x 1024 RGB |
| Output | 2 classes: checkbox_checked, checkbox_unchecked |
| Size | 10.3 MB |
| Runtime | onnxruntime (CPU) |
Training data
This revision is fine-tuned from the earlier checkbox detector on manually reviewed DocumentCloud pages with boxed checked/unchecked labels, with extra emphasis on small checkbox forms that were under-recalled by the first public checkpoint.
The original base dataset used ~5,100 document page images from two sources:
- DocumentCloud: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like
"check all that apply"and"inspection checklist". Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters. - Derived from CommonForms (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples.
The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked).
| Split | Source images | Tiles |
|---|---|---|
| Train | 4,095 | 16,243 |
| Val | 1,026 | 4,026 |
| Test | 37 | 37 (untiled) |
Performance
Fine-tune validation metrics on tiled validation images:
| Class | Precision | Recall | mAP50 | mAP50-95 |
|---|---|---|---|---|
| All | 0.875 | 0.834 | 0.857 | 0.559 |
| checkbox_checked | 0.816 | 0.743 | 0.757 | 0.446 |
| checkbox_unchecked | 0.934 | 0.925 | 0.957 | 0.672 |
On 102 completed manually reviewed gold pages from the current labeling set:
| Inference mode | Precision | Recall | Checked recall | Unchecked recall |
|---|---|---|---|---|
| Default 1x pass | 0.962 | 0.948 | 0.878 | 0.963 |
natural-pdf magnify="auto" |
0.950 | 0.962 | 0.912 | 0.973 |
Inference details
natural-pdf renders pages at 72 DPI for this model to match the fine-tuning scale, uses SAHI-style 1024x1024 tiling when the rendered image exceeds the model input size, and can run an optional magnified second pass for small detected checkbox geometry.
Only needs onnxruntime, numpy, Pillow, and huggingface_hub — no PyTorch or Ultralytics at inference time.
Background
This model was inspired by FFDNet-L, a form field detector that can find unchecked checkboxes (as choice_button) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector.
License
Apache 2.0. Training data derived in part from CommonForms (Apache 2.0).