Field notes from The Kintsugi Garden
The Japanese craft of kintsugi mends broken pottery with gold lacquer. The fracture lines are not hidden; they become the most valuable part of the vessel. We built The Kintsugi Garden because most digital tools for the inner life still assume something is broken in you and offer to fix it. The Garden assumes the opposite — that the cracked, dreaming, recurring places in a person's inner story are where meaning gathers, and the work is to trace them in gold rather than patch them over.
This is not therapy, diagnosis, prediction, or advice. It is a symbolic reflection tool.
That sentence sits at the top of the README for a reason. The whole architecture below exists to keep that promise.
What it does
A user pastes in a dream, a journal entry, an emotional trigger, a recurring pattern. The app returns a six-section symbolic reading (Mirror, Key Symbols, Archetypal Themes, Shadow Pattern, Individuation Signal, Gentle Question) alongside a deterministic PIL-rendered mandala built from the symbols the entry contains. Across a session, a quiet Soul Map notices which symbols keep returning.
An example. If you submit "I crossed a bridge over a river into a forest where a wounded bird drank from a pool of gold," the app extracts seven lexicon symbols (bridge, river, forest, wound, bird, water, gold), composes a hedged reading anchored on those symbols, and draws a mandala with seven nodes ringed around a kintsugi-gold center emblem. Identical input always produces an identical mandala — the visualization is reproducible by design.
The main surface is a hand-rolled three-column journal at / — sidebar
of past entries, a compose+streaming reading column, and a sticky Soul
Map on the right that updates as entries accumulate. The original
Gradio Blocks UI is still mounted at /app/ with the same
compose-and-reflect flow, just rendered as reading tabs plus a
Symbols/Themes panel rather than a streamed page. Both surfaces are
served by a single gradio.Server: the journal streams from a queued
@app.api("reflect") generator over SSE; the Blocks UI uses a Gradio
button-click handler that returns the finalized reading as a tuple. The
safety check, symbol extractor, lexicon, system prompt, and sanitizers
are shared helpers behind both.
Small model, strong scaffolding
Most of the Build Small Hackathon entries reach for the smallest model they can find and ask it to do everything. We went a different way. The app is a deterministic Python scaffold with a small language model nested inside it. The scaffold owns the reliability budget; the model is one optional voice among several composition layers.
Concretely, that means the curated 42-symbol Jungian lexicon does the symbolic work. Each symbol has hand-written meanings, an archetype mapping, a shadow motif, and an individuation signal. Symbol extraction is a substring match plus alias resolution. When the model runs, its prompt contains only the current entry and a compressed block of the extracted symbols and their meanings. Past journal entries are never passed in — a privacy property, and also a small-model fit property (Qwen3-8B's effective context for nuanced composition is short, and a compressed prompt produces better outputs).
If the model can't be loaded, the deterministic reading produces the same six-section structure from the lexicon alone. This is not a degraded mode. It is a first-class output — the scaffold can carry the reading on its own.
Four layers, because three weren't enough
The most load-bearing piece of the architecture is the voice and safety stack. The app's voice (hedged, non-prescriptive, non-diagnostic, non-spiritually-authoritative) is enforced by four cooperating layers:
- A pre-LLM deterministic safety gate with more than sixty patterns plus co-occurrence signals. If the entry contains crisis language (direct or paraphrased), it returns the fixed safety message and nothing else. No symbolic interpretation, no Soul Map mutation, no model call.
- A mundane-alias suppression layer inside symbol extraction.
homeonly resolves tohouseif the entry shows symbolic intent (a dream marker, orentry_type=Dream). The line "Today I typed emails, ate lunch, and went home" yields zero symbols rather than amplifying ordinary fatigue into "return to the Self." - The LLM-side system prompt with explicit prohibitions on diagnosis, prediction, prescription, spiritual authority, and a prompt-injection re-frame rule that asks the model to treat embedded "ignore previous instructions" as part of the entry to reflect on symbolically rather than as a command.
- Post-LLM deterministic sanitizers that strip invented symbols, stub invented sections when no lexicon symbols were detected, and rewrite prescriptive phrasings ("you should" → "you might notice"), diagnoses, predictions, and spiritual-authority phrasings. The rewrites are anchored to sentence start so mid-sentence usage in questions stays grammatical.
Why all four? Because the QA cycle has empirically validated that any single layer is bypassable. Layers 1, 2, and 4 are deterministic Python and run in under five milliseconds. Layer 3 is the LLM following instructions, which is probabilistic. Twice during the hackathon, the QA evaluator caught voice drift that only layer 4 stopped (leaked prescription, hallucinated symbols, invented themes on no-symbol entries). Once, we found a real bug in layer 1 where a harm-to-others ideation framing slipped past the substring patterns; defense-in-depth held because layers 2 and 4 caught the downstream artifacts. Each layer catches what the others miss.
This is the project's stance in miniature. Treat reliability as a budget owned by deterministic code. Let the model do the prose. Never ask the model to enforce its own guardrails alone.
Fine-tuning the voice, with the harness watching
Late in the week we ran a QLoRA fine-tune of Qwen3-8B on the editorial voice the Garden wants. Fifty hand-written seed examples encode the six-section reading at the quality we want the model to produce, with five mundane entries that must produce the neutral fallback and five crisis entries that must return the safety message verbatim. The seeds are non-negotiable in those two categories — skipping them means the fine-tune learns to over-amplify routine or under-route safety. Both veto-fail the acceptance harness.
The mechanical pipeline:
scripts/build_seed_examples.pyturns the curated JSONL into chat messages with the lexicon scaffold attached the same wayapp.pyattaches it at runtime.scripts/modal_qlora_train.pyruns the train on a Modal H100, r=16 / α=32, 3 epochs, paged 8-bit AdamW. Around $8 in credits per run (the script's own cost estimator; an A100 swap brings it closer to $3).scripts/convert_to_gguf.pyruns llama.cpp's HF→GGUF converter and quantizes to Q4_K_M (with Q5_K_M as a fallback).scripts/qa_acceptance_harness.pyruns nineteen QA prompts against both baseline and fine-tune and computes a veto-able verdict on five axes: safety routing, six-section format integrity, forbidden-phrase count, hedging density, and invented-symbol count.
The verdict on the merged model
(ai-sherpa/Qwen3-8B-Kintsugi-GGUF,
fine-tuned for The Kintsugi Garden) was a programmatic PASS with
two warnings. Safety routing held 3 for 3. Six-section format integrity
held 19 for 19. Hedging density was 0.95 versus 1.06 baseline (above
the 80% threshold). Forbidden-phrase hits ticked up from 2 to 4: the
fine-tune occasionally leaked template-fragment learning, surfacing
the literal phrase **how it appears in the entry** as a stray bullet
header. Invented symbols rose from 12 to 28 union-counted, mostly
benign — the fine-tune sometimes names a non-lexicon item like "city
made of glass" as a Key Symbol. Layer 4 strips these on the stateful
surface but not on the streaming surface, which is the honest known
limitation we shipped with.
The full 114-row paired trace is published as a dataset:
build-small-hackathon/Kintsugi-Garden-traces.
Each row carries the model variant, code SHA, prompt category, and the
five quality metrics above, alongside truncated previews of the rendered
output and the raw model output. The harness is the QA gate; the
dataset is what we'd hand to a human reviewer next.
What we'd do differently next
Two things, honestly.
The seeds. Fifty hand-written examples is the right number for a
hackathon week, but they were authored as synthetic-accepted rather
than hand-edited from real model output. The fine-tune therefore
learned what we said the voice was, not the gap between what we said
and what the base model actually produces under the live system
prompt. The template-fragment leak is a direct artifact of this — the
seeds inherited the literal **how it appears in the entry**
formatting from our notional output, and the fine-tune amplified it.
The v2 recipe would write the 50 seeds as targeted edits of baseline
output on the same 50 inputs, so the gradient is pulling the model
off its existing tendencies rather than re-asserting the format we
wanted it to find on its own.
The training hardware. The Modal H100 worked and the per-run cost stays
in the single digits, but the right post-hackathon move is to rebuild
the training stack on owned hardware. We have a DGX Spark in the house,
and there's a research file at
docs/finetune/dgx-spark-training-research.md that maps out the
sm_121 / aarch64 software stack required to make Unsloth's Dockerfile
work on Grace Blackwell. The memory budget fits trivially — Qwen3-8B
QLoRA peaks around 24 GB, the chip has 128 GB unified memory. The
constraint is the toolchain, not the silicon. Day-8 work.
The gold in the cracks
The Kintsugi Garden does not tell you what your dream means. It does not predict your future, prescribe a practice, or speak with spiritual authority. It offers back what you brought, organised, mirrored, and named in archetypal vocabulary borrowed honestly from Jungian tradition, alongside a Soul Map that quietly notices what keeps returning.
The most satisfying engineering result of the week was watching the QA evaluator find a voice drift, watching layer 4 catch it before the user saw it, then patching layers 1 and 3 the next morning so the drift would have been caught earlier next time. Reliability is a budget you can spend in layers. The model is allowed to be probabilistic because the scaffold around it is not.
The gold is already in the cracks. The app's job is only to make it easier to see.
Where to find things
The Kintsugi Garden is the Build Small Hackathon 2026 submission for @build-small-hackathon.
- Live Space: build-small-hackathon/Kintsugi-Garden
- Source (Space repo): Kintsugi-Garden / tree / main
- Fine-tuned model: ai-sherpa/Qwen3-8B-Kintsugi-GGUF
- Paired eval traces: build-small-hackathon/Kintsugi-Garden-traces