Lambent
/

Qwen3-4B-Base-Continued-GRPO-Style-Karcher

Text Generation

text-generation-inference

Model card Files Files and versions

Lambent commited on Feb 5

Commit

0845228

·

verified ·

1 Parent(s): f71c9bc

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -6,7 +6,27 @@ tags:
 - merge
 ---
-# qwen4bstylekarcher
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

 - merge
 ---
+For this one...
+... (over)trained a SmolLM2-360M on 5 epochs at swept-for LR and rank on each of the target domains to fit style,
+then rewarded the model for lowering perplexity on the proxy model.
+In this case, trained an adapter per domain and then Karcher merged them.
+I'm not sure if any of the domains had notably different effect, they all basically had the same result on evals.
+However, the karcher combination of them seem to have significantly lowered perplexity on lambada_openai, which is interesting enough to publish.
+| Task | Metric | Qwen3-4B-Base | GRPO-Merge | Δ Base | GRPO-Wave | Δ Base | Δ Merge | Style-Karcher | Δ Base | Δ Wave |
+|:-----|:-------|:-------------:|:----------:|:------:|:---------:|:------:|:-------:|:-------------:|:------:|:------:|
+| arc_easy | acc | 0.7891 | 0.7870 | -0.27% | 0.7912 | +0.27% | +0.53% | 0.7883 | -0.10% | -0.37% |
+| arc_easy | acc_norm | 0.7609 | 0.7605 | -0.05% | 0.7643 | +0.45% | +0.50% | 0.7576 | -0.43% | -1.04% |
+| lambada_openai | acc | 0.6912 | 0.6984 | +1.04% | 0.7006 | +1.36% | +0.31% | **0.7087** | **+2.53%** | +1.16% |
+| lambada_openai | perplexity ↓ | 4.2433 | 4.0490 | -4.58% | 3.9616 | -6.64% | -2.16% | **3.8343** | **-9.63%** | -3.21% |
+| openbookqa | acc | 0.3160 | 0.3180 | +0.63% | 0.3180 | +0.63% | ±0.00% | 0.3160 | ±0.00% | -0.63% |
+| openbookqa | acc_norm | 0.4100 | 0.4120 | +0.49% | 0.4100 | ±0.00% | -0.49% | 0.4080 | -0.49% | -0.49% |
+| piqa | acc | 0.7797 | 0.7807 | +0.13% | 0.7813 | +0.21% | +0.08% | 0.7786 | -0.14% | -0.35% |
+| piqa | acc_norm | 0.7807 | 0.7807 | ±0.00% | 0.7813 | +0.08% | +0.08% | 0.7807 | ±0.00% | -0.08% |
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).