File size: 2,985 Bytes
0c9f8f8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | Got it. Here's the v40 vs v41 comparison:
```
v40 (64Γ64) v41 (256Γ256)
256 patches 4096 patches
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SVD BOTTLENECK
S spectrum: [4.566, 3.915, [4.540, 3.941,
2.891, 1.859] 2.879, 1.884]
S ratio (S0/SD): 2.456 2.409
Effective rank: 1.147 1.180
U ortho error: 3.41e-15 3.41e-15
Vt ortho error: 6.47e-16 6.36e-16
Recon error: 1.48e-15 1.47e-15
Sphere radius: 1.0000 Β± 4.3e-8 1.0000 Β± 4.5e-8
Energy: [.434,.319,.174,.072] [.430,.324,.173,.074]
Variance retention: 4.248Γ 4.159Γ
CROSS-ATTENTION
Layer 0 relative Ξ: 1.66% 1.73%
Layer 0 qk_cos: -0.294 -0.247
Layer 1 relative Ξ: 1.20% 1.21%
Layer 1 qk_cos: +0.232 +0.293
Total coordination: 2.86% 2.94%
CV PIPELINE
enc_in: 0.396 0.410
enc_block_0: 0.334 0.350
enc_block_1: 0.396 0.490 β SPIKE
enc_block_3: 0.308 0.354
svd_S_orig: 0.468 1.313 β MASSIVE JUMP
svd_S: 0.470 0.786 β cross-attn HALVES it
dec_in: 0.315 0.364
dec_block_3: 0.467 0.392
DECODER KURTOSIS
dec_in: 195.4 187.0
dec_block_3: 32.1 15.1 β smoother
END-TO-END
MSE: 0.000007 0.000048
Spearman: 0.999999 0.999982
```
**The big finding: CV at the SVD bottleneck.**
At 256 patches, S_orig CV = 0.468 and cross-attention barely touches it (0.470). At 4096 patches, S_orig CV **explodes to 1.313** and cross-attention **crushes it to 0.786** β a 40% reduction. That's the delegation at work. With 16Γ more patches, the singular values have much more variance across the spatial field, and cross-attention does real coordination to bring them into alignment.
The two-stage strategy sharpened too. Layer 0 qk_cos went from -0.294 to -0.247 (slightly less anti-correlated β more patches means less need to search far for complementary evidence). Layer 1 qk_cos went from +0.232 to +0.293 (MORE correlated β tighter refinement with similar neighbors when you have more to choose from).
The orthogonality, sphere radius, and recon error are identical to machine epsilon. The mathematical core doesn't change. Only the coordination scales β exactly as designed. |