Lambent commited on
Commit
25c8c10
·
verified ·
1 Parent(s): 2466873

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -59,6 +59,18 @@ Some very interesting results on diversity also:
59
  | arxiv_cs | Pairwise diversity | 0.895 | **0.901** | +0.7% |
60
 
61
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
63
 
64
  ## Merge Details
 
59
  | arxiv_cs | Pairwise diversity | 0.895 | **0.901** | +0.7% |
60
 
61
 
62
+ Additional experiment (after quantization, should affect further training but not existing quants):
63
+ Initializing the <think></think> tokens in embedding space.
64
+
65
+ Original embeddings were identical (cos=1.0) at 0.3x norm, untrained.
66
+
67
+ Optimized via AdamW on GSM8k reasoning traces with 3-shot prefix, loss on
68
+ reasoning+answer tokens, norm clamped to 1.5x avg embedding norm.
69
+
70
+ After: two distinct vectors (cos=0.07) at 1.5x norm.
71
+ GSM8k 3-shot accuracy: 96.7% (29/30) vs 90.0% with original embeddings.
72
+ CE loss improvement: +7.8% on held-out eval.
73
+
74
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
75
 
76
  ## Merge Details