# Phase 1 Status — Pareto Exploration

## Timestamp: 2026-03-31 08:33 UTC (5.5 hours in)

## Summary
- **17 trials completed**, 8 still running (will finish within ~1h)
- **All 8 strategies tested** with at least 1 completed trial each
- **Pareto front established** from 8K to 10M params
- Seeds: 13 from previous work, 17+ new Phase 1 completions

## Phase 1 Pareto Front (seeds + new, non-dominated)

| Params | val_loss | top1 | Strategy | Source |
|--------|----------|------|----------|--------|
| 8,434 | 2.7207 | 25.8% | sparse | phase1 10K steps |
| 16,384 | 2.4580 | 30.3% | bottleneck | phase1 10K steps |
| 65,536 | 2.3463 | 32.8% | lora | phase1 10K steps |
| 131,072 | 2.1118 | 37.7% | bottleneck | phase1 10K steps |
| 524,000 | 1.9200 | 41.7% | bottleneck | seed 100K steps |
| 1,000,000 | 1.8500 | 43.5% | bottleneck | seed 100K steps |
| 7,800,000 | 1.7759 | 45.4% | bottleneck | seed 10K steps |
| 8,390,656 | 1.7667 | 45.6% | unfreeze | phase1 10K steps |
| 10,000,000 | 1.5634 | 50.5% | bottleneck | seed 100K steps |

## Key Findings

1. **Parameter budget is 94% of what matters** (Optuna param importance). Strategy choice contributes <1%.
2. **Bottleneck dominates** from 131K-10M params on the Pareto front.
3. **LoRA is competitive below 131K** — at 65K params it's on the front.
4. **Unfreeze matches bottleneck at 8M+** but is impractical for hypernetworks.
5. **Film is the weakest strategy** — vl=2.83 at 17K params, far from the front.
6. **Specialized CLM can't compete** with backbone-adapted models at any scale.
7. **LoRA with FFN** (rank=16, --lora-ffn) at 1.5M is showing vl=1.93 — may be the best non-bottleneck strategy.

## Strategy Rankings by Tier

| Tier | Best | Runner-up | Avoid |
|------|------|-----------|-------|
| <50K | bottleneck dim=1-4 | lora rank=1-3 | film, sparse |
| 50K-200K | bottleneck/lora (tied) | hybrid | rosa, sparse |
| 200K-2M | bottleneck | lora+FFN | specialized_clm |
| 2M-10M | bottleneck | unfreeze | rosa, sparse |

## Phase 2 Plan

Focus on head-to-head comparisons at key budget tiers with 50K steps:
1. **~65K**: bottleneck dim=8 L4-7 vs lora rank=2 all vs lora rank=4 L4-7
2. **~250K**: bottleneck dim=32 L4-7 vs lora rank=8 all
3. **~1M**: bottleneck dim=64 all vs lora rank=16+FFN vs bottleneck dim=128 L4-7
4. **~5M**: bottleneck dim=610 L4-7 vs unfreeze layer 7
