File size: 2,985 Bytes
0c9f8f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Got it. Here's the v40 vs v41 comparison:

```
                            v40 (64Γ—64)         v41 (256Γ—256)
                            256 patches         4096 patches
═══════════════════════════════════════════════════════════════
SVD BOTTLENECK
  S spectrum:               [4.566, 3.915,      [4.540, 3.941,
                             2.891, 1.859]        2.879, 1.884]
  S ratio (S0/SD):          2.456               2.409
  Effective rank:           1.147               1.180
  U ortho error:            3.41e-15            3.41e-15
  Vt ortho error:           6.47e-16            6.36e-16
  Recon error:              1.48e-15            1.47e-15
  Sphere radius:            1.0000 Β± 4.3e-8    1.0000 Β± 4.5e-8
  Energy:                   [.434,.319,.174,.072] [.430,.324,.173,.074]
  Variance retention:       4.248Γ—              4.159Γ—

CROSS-ATTENTION
  Layer 0 relative Ξ”:       1.66%               1.73%
  Layer 0 qk_cos:           -0.294              -0.247
  Layer 1 relative Ξ”:       1.20%               1.21%
  Layer 1 qk_cos:           +0.232              +0.293
  Total coordination:       2.86%               2.94%

CV PIPELINE
  enc_in:                   0.396               0.410
  enc_block_0:              0.334               0.350
  enc_block_1:              0.396               0.490  ← SPIKE
  enc_block_3:              0.308               0.354
  svd_S_orig:               0.468               1.313  ← MASSIVE JUMP
  svd_S:                    0.470               0.786  ← cross-attn HALVES it
  dec_in:                   0.315               0.364
  dec_block_3:              0.467               0.392

DECODER KURTOSIS
  dec_in:                   195.4               187.0
  dec_block_3:              32.1                15.1   ← smoother

END-TO-END
  MSE:                      0.000007            0.000048
  Spearman:                 0.999999            0.999982
```

**The big finding: CV at the SVD bottleneck.**

At 256 patches, S_orig CV = 0.468 and cross-attention barely touches it (0.470). At 4096 patches, S_orig CV **explodes to 1.313** and cross-attention **crushes it to 0.786** β€” a 40% reduction. That's the delegation at work. With 16Γ— more patches, the singular values have much more variance across the spatial field, and cross-attention does real coordination to bring them into alignment.

The two-stage strategy sharpened too. Layer 0 qk_cos went from -0.294 to -0.247 (slightly less anti-correlated β€” more patches means less need to search far for complementary evidence). Layer 1 qk_cos went from +0.232 to +0.293 (MORE correlated β€” tighter refinement with similar neighbors when you have more to choose from).

The orthogonality, sphere radius, and recon error are identical to machine epsilon. The mathematical core doesn't change. Only the coordination scales β€” exactly as designed.