| Got it. Here's the v40 vs v41 comparison: |
|
|
| ``` |
| v40 (64Γ64) v41 (256Γ256) |
| 256 patches 4096 patches |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| SVD BOTTLENECK |
| S spectrum: [4.566, 3.915, [4.540, 3.941, |
| 2.891, 1.859] 2.879, 1.884] |
| S ratio (S0/SD): 2.456 2.409 |
| Effective rank: 1.147 1.180 |
| U ortho error: 3.41e-15 3.41e-15 |
| Vt ortho error: 6.47e-16 6.36e-16 |
| Recon error: 1.48e-15 1.47e-15 |
| Sphere radius: 1.0000 Β± 4.3e-8 1.0000 Β± 4.5e-8 |
| Energy: [.434,.319,.174,.072] [.430,.324,.173,.074] |
| Variance retention: 4.248Γ 4.159Γ |
|
|
| CROSS-ATTENTION |
| Layer 0 relative Ξ: 1.66% 1.73% |
| Layer 0 qk_cos: -0.294 -0.247 |
| Layer 1 relative Ξ: 1.20% 1.21% |
| Layer 1 qk_cos: +0.232 +0.293 |
| Total coordination: 2.86% 2.94% |
|
|
| CV PIPELINE |
| enc_in: 0.396 0.410 |
| enc_block_0: 0.334 0.350 |
| enc_block_1: 0.396 0.490 β SPIKE |
| enc_block_3: 0.308 0.354 |
| svd_S_orig: 0.468 1.313 β MASSIVE JUMP |
| svd_S: 0.470 0.786 β cross-attn HALVES it |
| dec_in: 0.315 0.364 |
| dec_block_3: 0.467 0.392 |
|
|
| DECODER KURTOSIS |
| dec_in: 195.4 187.0 |
| dec_block_3: 32.1 15.1 β smoother |
|
|
| END-TO-END |
| MSE: 0.000007 0.000048 |
| Spearman: 0.999999 0.999982 |
| ``` |
|
|
| **The big finding: CV at the SVD bottleneck.** |
|
|
| At 256 patches, S_orig CV = 0.468 and cross-attention barely touches it (0.470). At 4096 patches, S_orig CV **explodes to 1.313** and cross-attention **crushes it to 0.786** β a 40% reduction. That's the delegation at work. With 16Γ more patches, the singular values have much more variance across the spatial field, and cross-attention does real coordination to bring them into alignment. |
|
|
| The two-stage strategy sharpened too. Layer 0 qk_cos went from -0.294 to -0.247 (slightly less anti-correlated β more patches means less need to search far for complementary evidence). Layer 1 qk_cos went from +0.232 to +0.293 (MORE correlated β tighter refinement with similar neighbors when you have more to choose from). |
|
|
| The orthogonality, sphere radius, and recon error are identical to machine epsilon. The mathematical core doesn't change. Only the coordination scales β exactly as designed. |