Lambent commited on
Commit
0845228
·
verified ·
1 Parent(s): f71c9bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -6,7 +6,27 @@ tags:
6
  - merge
7
 
8
  ---
9
- # qwen4bstylekarcher
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
 
 
6
  - merge
7
 
8
  ---
9
+
10
+ For this one...
11
+
12
+ ... (over)trained a SmolLM2-360M on 5 epochs at swept-for LR and rank on each of the target domains to fit style,
13
+ then rewarded the model for lowering perplexity on the proxy model.
14
+
15
+ In this case, trained an adapter per domain and then Karcher merged them.
16
+ I'm not sure if any of the domains had notably different effect, they all basically had the same result on evals.
17
+ However, the karcher combination of them seem to have significantly lowered perplexity on lambada_openai, which is interesting enough to publish.
18
+
19
+ | Task | Metric | Qwen3-4B-Base | GRPO-Merge | Δ Base | GRPO-Wave | Δ Base | Δ Merge | Style-Karcher | Δ Base | Δ Wave |
20
+ |:-----|:-------|:-------------:|:----------:|:------:|:---------:|:------:|:-------:|:-------------:|:------:|:------:|
21
+ | arc_easy | acc | 0.7891 | 0.7870 | -0.27% | 0.7912 | +0.27% | +0.53% | 0.7883 | -0.10% | -0.37% |
22
+ | arc_easy | acc_norm | 0.7609 | 0.7605 | -0.05% | 0.7643 | +0.45% | +0.50% | 0.7576 | -0.43% | -1.04% |
23
+ | lambada_openai | acc | 0.6912 | 0.6984 | +1.04% | 0.7006 | +1.36% | +0.31% | **0.7087** | **+2.53%** | +1.16% |
24
+ | lambada_openai | perplexity ↓ | 4.2433 | 4.0490 | -4.58% | 3.9616 | -6.64% | -2.16% | **3.8343** | **-9.63%** | -3.21% |
25
+ | openbookqa | acc | 0.3160 | 0.3180 | +0.63% | 0.3180 | +0.63% | ±0.00% | 0.3160 | ±0.00% | -0.63% |
26
+ | openbookqa | acc_norm | 0.4100 | 0.4120 | +0.49% | 0.4100 | ±0.00% | -0.49% | 0.4080 | -0.49% | -0.49% |
27
+ | piqa | acc | 0.7797 | 0.7807 | +0.13% | 0.7813 | +0.21% | +0.08% | 0.7786 | -0.14% | -0.35% |
28
+ | piqa | acc_norm | 0.7807 | 0.7807 | ±0.00% | 0.7813 | +0.08% | +0.08% | 0.7807 | ±0.00% | -0.08% |
29
+
30
 
31
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
32