Update README.md
Browse files
README.md
CHANGED
|
@@ -16,6 +16,8 @@ In this case, trained an adapter per domain and then Karcher merged them.
|
|
| 16 |
I'm not sure if any of the domains had notably different effect, they all basically had the same result on evals.
|
| 17 |
However, the karcher combination of them seem to have significantly lowered perplexity on lambada_openai, which is interesting enough to publish.
|
| 18 |
|
|
|
|
|
|
|
| 19 |
| Task | Metric | Qwen3-4B-Base | GRPO-Merge | Δ Base | GRPO-Wave | Δ Base | Δ Merge | Style-Karcher | Δ Base | Δ Wave |
|
| 20 |
|:-----|:-------|:-------------:|:----------:|:------:|:---------:|:------:|:-------:|:-------------:|:------:|:------:|
|
| 21 |
| arc_easy | acc | 0.7891 | 0.7870 | -0.27% | 0.7912 | +0.27% | +0.53% | 0.7883 | -0.10% | -0.37% |
|
|
|
|
| 16 |
I'm not sure if any of the domains had notably different effect, they all basically had the same result on evals.
|
| 17 |
However, the karcher combination of them seem to have significantly lowered perplexity on lambada_openai, which is interesting enough to publish.
|
| 18 |
|
| 19 |
+
Additionally, attempted to implement MARA from https://im-ant.github.io/mara/ on the GRPO side to help preserve distribution entropy, though I'm unsure how correctly/usefully we did so.
|
| 20 |
+
|
| 21 |
| Task | Metric | Qwen3-4B-Base | GRPO-Merge | Δ Base | GRPO-Wave | Δ Base | Δ Merge | Style-Karcher | Δ Base | Δ Wave |
|
| 22 |
|:-----|:-------|:-------------:|:----------:|:------:|:---------:|:------:|:-------:|:-------------:|:------:|:------:|
|
| 23 |
| arc_easy | acc | 0.7891 | 0.7870 | -0.27% | 0.7912 | +0.27% | +0.53% | 0.7883 | -0.10% | -0.37% |
|