Qwen3.5-27B-Omnimerge-GGUF
GGUF quantizations of ManniX-ITA/Qwen3.5-27B-Omnimerge — a 3-way Task Arithmetic weight-space merge of three Qwen3.5-27B reasoning-distilled fine-tunes.
This merge outperforms its best source model (Claude-4.6-Opus-Reasoning-Distilled) across all tested benchmarks: +8 pp on GPQA Diamond reasoning, +3.7 pp on HumanEval, and comparable MBPP.
All quants made with imatrix using calibration data v5.
Benchmark Results (Q6_K)
| Benchmark | Omnimerge | Claude-distill (best source) | Delta |
|---|---|---|---|
| GPQA Diamond (198q, flex) | 61.11% | 53.03% | +8.08 pp |
| HumanEval pass@1 | 79.88% | 76.22% | +3.66 pp |
| MBPP pass@1 | 71.80% | 71.20% | +0.60 pp |
Available Quantizations
| Quantization | File | Size |
|---|---|---|
| Q8_0 | merged_omnimerge-Q8_0.gguf | 26.63 GB |
| Q6_K_L | merged_omnimerge-Q6_K_L.gguf | 21.14 GB |
| Q6_K | merged_omnimerge-Q6_K.gguf | 20.57 GB |
| Q5_K_L | merged_omnimerge-Q5_K_L.gguf | 18.64 GB |
| Q5_K_M | merged_omnimerge-Q5_K_M.gguf | 17.91 GB |
| Q5_K_S | merged_omnimerge-Q5_K_S.gguf | 17.40 GB |
| Q4_K_L | merged_omnimerge-Q4_K_L.gguf | 16.29 GB |
| Q4_1 | merged_omnimerge-Q4_1.gguf | 15.91 GB |
| Q4_K_M | merged_omnimerge-Q4_K_M.gguf | 15.41 GB |
| IQ4_NL | merged_omnimerge-IQ4_NL.gguf | 14.72 GB |
| Q4_K_S | merged_omnimerge-Q4_K_S.gguf | 14.52 GB |
| Q4_0 | merged_omnimerge-Q4_0.gguf | 14.41 GB |
| IQ4_XS | merged_omnimerge-IQ4_XS.gguf | 14.05 GB |
| Q3_K_XL | merged_omnimerge-Q3_K_XL.gguf | 13.42 GB |
| Q3_K_L | merged_omnimerge-Q3_K_L.gguf | 13.36 GB |
| Q3_K_M | merged_omnimerge-Q3_K_M.gguf | 12.39 GB |
| IQ3_M | merged_omnimerge-IQ3_M.gguf | 11.72 GB |
| Q3_K_S | merged_omnimerge-Q3_K_S.gguf | 11.24 GB |
| IQ3_XS | merged_omnimerge-IQ3_XS.gguf | 11.15 GB |
| IQ3_XXS | merged_omnimerge-IQ3_XXS.gguf | 10.42 GB |
Skipped Quantizations (failed sanity check)
The following 2-bit quantizations were attempted but failed the sanity check (3 capital city questions answered incorrectly or incoherently). At 2-bit precision on a 27B model, too much information is lost for reliable output. These quants are intentionally not published:
- Q2_K_L, Q2_K, IQ2_M, IQ2_S, IQ2_XS, IQ2_XXS
Recommended Usage
llama-server -m Qwen3.5-27B-Omnimerge-Q6_K.gguf -c 32768 -ngl 99 \
--jinja --reasoning-format deepseek --reasoning-budget 16384 \
--temp 0.6 --top-p 0.95 --top-k 20 --dry-multiplier 0.5
For code tasks, use without --jinja --reasoning-format deepseek (plain completions mode).
Source Models
| Source | Weight | Focus |
|---|---|---|
| Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled | 0.40 | Claude 4.6 Opus reasoning distillation |
| ValiantLabs/Qwen3.5-27B-Esper3.1 | 0.35 | Code / DevOps specialist |
| DavidAU/Qwen3.5-27B-Gemini3-Pro-High-Reasoning-Compact-Thinking | 0.25 | Gemini 3 Pro reasoning, compact thinking |
Base: Qwen/Qwen3.5-27B
See the model card for full methodology, evaluation details, and the custom merger script.
License
Apache-2.0
- Downloads last month
- 1,735
3-bit
4-bit
5-bit
6-bit
8-bit