Lambent
/

Qwen3-4B-Base-Continued-GRPO-Style-Karcher

Text Generation

text-generation-inference

Model card Files Files and versions

Lambent commited on Feb 5

Commit

4a15edf

·

verified ·

1 Parent(s): 0845228

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -16,6 +16,8 @@ In this case, trained an adapter per domain and then Karcher merged them.
 I'm not sure if any of the domains had notably different effect, they all basically had the same result on evals.
 However, the karcher combination of them seem to have significantly lowered perplexity on lambada_openai, which is interesting enough to publish.
 | Task | Metric | Qwen3-4B-Base | GRPO-Merge | Δ Base | GRPO-Wave | Δ Base | Δ Merge | Style-Karcher | Δ Base | Δ Wave |
 |:-----|:-------|:-------------:|:----------:|:------:|:---------:|:------:|:-------:|:-------------:|:------:|:------:|
 | arc_easy | acc | 0.7891 | 0.7870 | -0.27% | 0.7912 | +0.27% | +0.53% | 0.7883 | -0.10% | -0.37% |

 I'm not sure if any of the domains had notably different effect, they all basically had the same result on evals.
 However, the karcher combination of them seem to have significantly lowered perplexity on lambada_openai, which is interesting enough to publish.
+Additionally, attempted to implement MARA from https://im-ant.github.io/mara/ on the GRPO side to help preserve distribution entropy, though I'm unsure how correctly/usefully we did so.
 | Task | Metric | Qwen3-4B-Base | GRPO-Merge | Δ Base | GRPO-Wave | Δ Base | Δ Merge | Style-Karcher | Δ Base | Δ Wave |
 |:-----|:-------|:-------------:|:----------:|:------:|:---------:|:------:|:-------:|:-------------:|:------:|:------:|
 | arc_easy | acc | 0.7891 | 0.7870 | -0.27% | 0.7912 | +0.27% | +0.53% | 0.7883 | -0.10% | -0.37% |