File size: 3,709 Bytes
e4c9e6f
 
 
 
fca4468
e4c9e6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bddc199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e4c9e6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
base_model:
- Qwen/Qwen3-8B
- Qwen/Qwen3-8B-Base
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/0z0yzAGcnw36qJ51eDl4Z.png
library_name: transformers
tags:
- mergekit
- merge

---
# CavesOfQwen3-8b

> Hey Hey, Model Gang, KaraWitch Here.  
> Have you, ever merged too deeply.  
> And found something 'they' don't want you to know?  
> "[CavesOfQwen3](https://youtu.be/o_PBfLbd3zw)", who is she? And why can't I reach her?

![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/0z0yzAGcnw36qJ51eDl4Z.png)

CavesOfQwen3-8b is a merge between the base model and the instruct model of Qwen3-8B (i.e. Qwen3-8B) and it's base model Qwen3-8B-base.

The idea for this merge is to remove the overbaked feeling that is in the instruct while retaining the instruct within the model.  

This is a merge of pre-trained language models created using ~~mergekit.~~  
This model is done with mergekitty. With a couple of code patch to add qwen3 and a `o_proj` into Qwen3 arch configuration (else vllm get's very grumpy over it.)  

I used `TIES`. Not because I'm lazy but because it's what I had lying around that isn't `SCE` or something else.

## Model Results (Thanks to [@SmerkyG](/SmerkyG))

|                 Tasks                 |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
|mmlu                                   |      2|none  |      |acc   |↑  |0.7478|±  |0.0034|

|    Tasks     |Version|Filter|n-shot|  Metric  |   |Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|-----:|---|-----:|
|arc_challenge |      1|none  |     0|acc       |↑  |0.5392|±  |0.0146|
|              |       |none  |     0|acc_norm  |↑  |0.5768|±  |0.0144|
|arc_easy      |      1|none  |     0|acc       |↑  |0.8178|±  |0.0079|
|              |       |none  |     0|acc_norm  |↑  |0.7963|±  |0.0083|
|hellaswag     |      1|none  |     0|acc       |↑  |0.5906|±  |0.0049|
|              |       |none  |     0|acc_norm  |↑  |0.7868|±  |0.0041|
|lambada_openai|      1|none  |     0|acc       |↑  |0.7357|±  |0.0061|
|              |       |none  |     0|perplexity|↓  |3.3203|±  |0.0674|
|piqa          |      1|none  |     0|acc       |↑  |0.7933|±  |0.0094|
|              |       |none  |     0|acc_norm  |↑  |0.7922|±  |0.0095|
|sciq          |      1|none  |     0|acc       |↑  |0.9630|±  |0.0060|
|              |       |none  |     0|acc_norm  |↑  |0.9570|±  |0.0064|
|winogrande    |      1|none  |     0|acc       |↑  |0.7182|±  |0.0126|



## Merge Details
### Merge Method

This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) as a base.

### Models Merged

The following models were included in the merge:
* [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
models:
  - model: Qwen/Qwen3-8B
    parameters:
      density: 0.4
      weight: 0.35
  - model: Qwen/Qwen3-8B-Base
    parameters:
      density: 0.7
      weight: 1

merge_method: ties
base_model: Qwen/Qwen3-8B
parameters:
  normalize: true
dtype: bfloat16
```

### Disclaimer

> CavesOfQwen3 and it's creator is not affiliated with Caves Of Qud or the creator of the video linked.  
> The reference is intentional, but it is supposed to be taken as a light hearted joke.
> There's not need to take it too deeply other than "Haha, funni name."
> This disclaimer is for those who think otherwise or are overthinkers.