Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
GGUF quantizations of Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled by TheCyberVine
Model Overview
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is a highly capable reasoning model fine-tuned on top of the powerful Qwen3.5 architecture. The model's core directive is to leverage state-of-the-art Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus interactions.
Key Features
- Claude-4.6 Opus Reasoning Distillation: Deep distillation and structural imitation of Claude-4.6-Opus reasoning chains
- Structured Thinking: Uses `` tags for internal reasoning with "Let me analyze this request carefully: 1..2..3..." pattern
- Native Developer Role Support: Fully supports the "developer" role without Jinja template patches
- Full Thinking Mode: Preserves complete chain-of-thought reasoning process (
thinking=1) - 262K Context Window: Full context with no compromises
- Coding Agent Optimized: Tested and optimized for Claude Code and OpenCode environments
Training Datasets
- nohurry/Opus-4.6-Reasoning-3000x-filtered - Comprehensive Claude 4.6 Opus reasoning trajectories
- TeichAI/claude-4.5-opus-high-reasoning-250x - High-intensity, structured reasoning instances
- Jackrong/Qwen3.5-reasoning-700x - Additional curated reasoning samples
Quantizations Available
| Quantization | File Size | BPW | Imatrix | Recommended Use |
|---|---|---|---|---|
| IQ2_S | 9.36 GB | ~2.7 BPW | ✅ Custom | Minimal VRAM, basic tasks |
| IQ3_M | 12.6 GB | ~3.3 BPW | ✅ Custom | Balanced performance |
| TQ3_1S | 13.9 GB | 4.12 BPW | ❌ No | Best 3-bit option |
| IQ4_XS | 15.1 GB | ~4.2 BPW | ✅ Custom | Most users |
| Q8_0 | 28.6 GB | ~8.0 BPW | ❌ No | High quality |
Quantization Details
Custom Imatrix Calibration
The imatrix quantizations (IQ2_S, IQ3_M, IQ4_XS) use a custom importance matrix derived from OpenCode sessions with the following composition:
- 50% Reasoning - Complex problem-solving and analytical tasks
- 30% Tools - Command execution, file operations, and tool usage
- 20% TypeScript - TypeScript code generation and analysis
This calibration ensures optimal performance for coding agent workloads, maintaining reasoning quality while minimizing file size.
TQ3_1S - Turbo Quantization
TQ3_1S is a ternary quantization (TQ) optimized for speed:
- Type: Ternary Quantization - uses three values
{-1, 0, 1} - Architecture: WHT-rotated 3-bit with dual half-precision scaling
- Performance: Extremely fast on AVX2 CPUs (up to 2x faster than standard Q4_K)
- Best for: Users wanting a 3-bit quantization with maximum speed
Ternary quantization uses optimized 8-level Lloyd-Max centroids and Walsh-Hadamard Transform rotation for efficient weight distribution. Note: Actual BPW varies by layer (typically ~3.5-4.0) - verify for your specific model.
Usage with llama.cpp
Basic Usage
# Download a quantization
huggingface-cli download superbudvar/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-IQ4_XS.gguf
# Run with llama-cli
llama-cli -m Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-IQ4_XS.gguf -c 4096 -t 8
Full 262K Context
llama-cli -m Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-IQ4_XS.gguf -c 262144 -t 8
Using with HuggingFace Hub
llama-cli --hf-repo superbudvar/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF --hf-file Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-IQ4_XS.gguf -c 262144
Prompt Template
This model uses the Qwen3 chat template:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Write a hello world program in Python.<|im_end|>
<|im_start|>assistant
Hardware Requirements
VRAM Requirements (Approximate)
| Quantization | VRAM/RAM Required | Recommended GPU |
|---|---|---|
| IQ2_S | ~10 GB | RTX 3060 / RX 6600 XT |
| IQ3_M | ~13 GB | RTX 3060 12GB / RX 6700 XT |
| TQ3_1S | ~14 GB | RTX 3070 / RX 6750 XT |
| IQ4_XS | ~16 GB | RTX 3070 / RX 6800 XT |
| Q8_0 | ~29 GB | RTX 3090 / RX 7900 XT |
CPU Inference
All quantizations work efficiently on modern CPUs with AVX2 or AVX-512 support. TQ3_1S is particularly optimized for AVX2 CPUs.
Example Reasoning Output
The model demonstrates structured thinking with clear step-by-step reasoning:
Let me analyze this request carefully:
1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
This streamlined reasoning paradigm reduces redundant cognitive loops while preserving deep analytical capacity.
Credits
- Base Model: Qwen/Qwen3.5-27B
- Fine-tuned Model: TheCyberVine/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled by TheCyberVine
- Training Framework: Unsloth 2026.3.3
- Datasets:
- Quantization: llama.cpp by ggerganov
- GGUF Format: ggml-org/ggml
License
This quantization inherits the Apache 2.0 license from the base model.
Citation
If you use this model in your research or projects, please cite the original model:
@misc{jackrong_qwen35_opus_distilled,
title = {Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/TheCyberVine/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled}}
}
- Downloads last month
- 5,351
Model tree for SuperBudVar/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
Base model
Qwen/Qwen3.5-27B