Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -138,6 +138,20 @@ graph LR
|
|
| 138 |
|
| 139 |
---
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
## Quick Start
|
| 142 |
|
| 143 |
<details open>
|
|
|
|
| 138 |
|
| 139 |
---
|
| 140 |
|
| 141 |
+
## What I Learned Getting This to Work Well
|
| 142 |
+
|
| 143 |
+
Getting TripoSR to produce clean 3D meshes on a phone took more work than just converting the model to ONNX. The raw model expects a very specific kind of input — a single object, centered, on a neutral background — and if you just feed it a raw photo, the results are pretty rough.
|
| 144 |
+
|
| 145 |
+
The biggest improvement came from **stripping the background** before inference. I'm using Apple's **Vision framework** (`VNGenerateForegroundInstanceMaskRequest` on iOS 17+) to automatically detect and isolate the main subject. This is the same API that powers the "lift subject from background" feature in Photos — it's fast, runs on-device, and handles edges surprisingly well. The isolated subject gets composited onto a **flat gray background** (RGB 0.5, 0.5, 0.5), which matches what TripoSR was trained on.
|
| 146 |
+
|
| 147 |
+
The second big win was **smart cropping and centering**. After removing the background, I analyze the remaining foreground pixels to find the bounding box, then scale and center the subject so it fills roughly **85-95% of the frame**. Too small and the model loses detail; too large and geometry gets clipped. The fill ratio adapts based on the object's shape — tall/narrow objects get a bit more breathing room, compact objects fill more of the frame. A small amount of padding (2-6%) prevents edge artifacts.
|
| 148 |
+
|
| 149 |
+
I also added a lightweight **image enhancement pipeline** before inference: noise reduction, luminance sharpening, and edge smoothing after the resize. Lanczos resampling (instead of bilinear) for the 512x512 resize made a noticeable difference in preserving fine detail. All of this runs through Core Image with Metal acceleration, so it adds minimal overhead.
|
| 150 |
+
|
| 151 |
+
The full pipeline — background removal, crop, center, enhance, infer — runs entirely on-device in [Haplo AI](https://apps.apple.com/us/app/haplo-ai-offline-private-ai/id6746702574). No server, no internet required.
|
| 152 |
+
|
| 153 |
+
---
|
| 154 |
+
|
| 155 |
## Quick Start
|
| 156 |
|
| 157 |
<details open>
|