Lambent
/

Qwen3-4B-Base-Continued-GRPO-Style-Karcher

Text Generation

text-generation-inference

Model card Files Files and versions

Lambent commited on Mar 3

Commit

25c8c10

·

verified ·

1 Parent(s): 2466873

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -59,6 +59,18 @@ Some very interesting results on diversity also:
   | arxiv_cs | Pairwise diversity | 0.895 | **0.901** | +0.7% |
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 ## Merge Details

   | arxiv_cs | Pairwise diversity | 0.895 | **0.901** | +0.7% |
+Additional experiment (after quantization, should affect further training but not existing quants):
+Initializing the <think></think> tokens in embedding space.
+Original embeddings were identical (cos=1.0) at 0.3x norm, untrained.
+Optimized via AdamW on GSM8k reasoning traces with 3-shot prefix, loss on
+reasoning+answer tokens, norm clamped to 1.5x avg embedding norm.
+After: two distinct vectors (cos=0.07) at 1.5x norm.
+GSM8k 3-shot accuracy: 96.7% (29/30) vs 90.0% with original embeddings.
+CE loss improvement: +7.8% on held-out eval.
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 ## Merge Details