Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,27 @@ tags:
|
|
| 6 |
- merge
|
| 7 |
|
| 8 |
---
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
| 12 |
|
|
|
|
| 6 |
- merge
|
| 7 |
|
| 8 |
---
|
| 9 |
+
|
| 10 |
+
For this one...
|
| 11 |
+
|
| 12 |
+
... (over)trained a SmolLM2-360M on 5 epochs at swept-for LR and rank on each of the target domains to fit style,
|
| 13 |
+
then rewarded the model for lowering perplexity on the proxy model.
|
| 14 |
+
|
| 15 |
+
In this case, trained an adapter per domain and then Karcher merged them.
|
| 16 |
+
I'm not sure if any of the domains had notably different effect, they all basically had the same result on evals.
|
| 17 |
+
However, the karcher combination of them seem to have significantly lowered perplexity on lambada_openai, which is interesting enough to publish.
|
| 18 |
+
|
| 19 |
+
| Task | Metric | Qwen3-4B-Base | GRPO-Merge | Δ Base | GRPO-Wave | Δ Base | Δ Merge | Style-Karcher | Δ Base | Δ Wave |
|
| 20 |
+
|:-----|:-------|:-------------:|:----------:|:------:|:---------:|:------:|:-------:|:-------------:|:------:|:------:|
|
| 21 |
+
| arc_easy | acc | 0.7891 | 0.7870 | -0.27% | 0.7912 | +0.27% | +0.53% | 0.7883 | -0.10% | -0.37% |
|
| 22 |
+
| arc_easy | acc_norm | 0.7609 | 0.7605 | -0.05% | 0.7643 | +0.45% | +0.50% | 0.7576 | -0.43% | -1.04% |
|
| 23 |
+
| lambada_openai | acc | 0.6912 | 0.6984 | +1.04% | 0.7006 | +1.36% | +0.31% | **0.7087** | **+2.53%** | +1.16% |
|
| 24 |
+
| lambada_openai | perplexity ↓ | 4.2433 | 4.0490 | -4.58% | 3.9616 | -6.64% | -2.16% | **3.8343** | **-9.63%** | -3.21% |
|
| 25 |
+
| openbookqa | acc | 0.3160 | 0.3180 | +0.63% | 0.3180 | +0.63% | ±0.00% | 0.3160 | ±0.00% | -0.63% |
|
| 26 |
+
| openbookqa | acc_norm | 0.4100 | 0.4120 | +0.49% | 0.4100 | ±0.00% | -0.49% | 0.4080 | -0.49% | -0.49% |
|
| 27 |
+
| piqa | acc | 0.7797 | 0.7807 | +0.13% | 0.7813 | +0.21% | +0.08% | 0.7786 | -0.14% | -0.35% |
|
| 28 |
+
| piqa | acc_norm | 0.7807 | 0.7807 | ±0.00% | 0.7813 | +0.08% | +0.08% | 0.7807 | ±0.00% | -0.08% |
|
| 29 |
+
|
| 30 |
|
| 31 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
| 32 |
|