File size: 3,709 Bytes
e4c9e6f fca4468 e4c9e6f bddc199 e4c9e6f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | ---
base_model:
- Qwen/Qwen3-8B
- Qwen/Qwen3-8B-Base
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/0z0yzAGcnw36qJ51eDl4Z.png
library_name: transformers
tags:
- mergekit
- merge
---
# CavesOfQwen3-8b
> Hey Hey, Model Gang, KaraWitch Here.
> Have you, ever merged too deeply.
> And found something 'they' don't want you to know?
> "[CavesOfQwen3](https://youtu.be/o_PBfLbd3zw)", who is she? And why can't I reach her?

CavesOfQwen3-8b is a merge between the base model and the instruct model of Qwen3-8B (i.e. Qwen3-8B) and it's base model Qwen3-8B-base.
The idea for this merge is to remove the overbaked feeling that is in the instruct while retaining the instruct within the model.
This is a merge of pre-trained language models created using ~~mergekit.~~
This model is done with mergekitty. With a couple of code patch to add qwen3 and a `o_proj` into Qwen3 arch configuration (else vllm get's very grumpy over it.)
I used `TIES`. Not because I'm lazy but because it's what I had lying around that isn't `SCE` or something else.
## Model Results (Thanks to [@SmerkyG](/SmerkyG))
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.7478|± |0.0034|
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|--------------|------:|------|-----:|----------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ |0.5392|± |0.0146|
| | |none | 0|acc_norm |↑ |0.5768|± |0.0144|
|arc_easy | 1|none | 0|acc |↑ |0.8178|± |0.0079|
| | |none | 0|acc_norm |↑ |0.7963|± |0.0083|
|hellaswag | 1|none | 0|acc |↑ |0.5906|± |0.0049|
| | |none | 0|acc_norm |↑ |0.7868|± |0.0041|
|lambada_openai| 1|none | 0|acc |↑ |0.7357|± |0.0061|
| | |none | 0|perplexity|↓ |3.3203|± |0.0674|
|piqa | 1|none | 0|acc |↑ |0.7933|± |0.0094|
| | |none | 0|acc_norm |↑ |0.7922|± |0.0095|
|sciq | 1|none | 0|acc |↑ |0.9630|± |0.0060|
| | |none | 0|acc_norm |↑ |0.9570|± |0.0064|
|winogrande | 1|none | 0|acc |↑ |0.7182|± |0.0126|
## Merge Details
### Merge Method
This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) as a base.
### Models Merged
The following models were included in the merge:
* [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
models:
- model: Qwen/Qwen3-8B
parameters:
density: 0.4
weight: 0.35
- model: Qwen/Qwen3-8B-Base
parameters:
density: 0.7
weight: 1
merge_method: ties
base_model: Qwen/Qwen3-8B
parameters:
normalize: true
dtype: bfloat16
```
### Disclaimer
> CavesOfQwen3 and it's creator is not affiliated with Caves Of Qud or the creator of the video linked.
> The reference is intentional, but it is supposed to be taken as a light hearted joke.
> There's not need to take it too deeply other than "Haha, funni name."
> This disclaimer is for those who think otherwise or are overthinkers. |