--- base_model: lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled library_name: gguf pipeline_tag: text-generation tags: - gguf - llama.cpp - lmstudio - reasoning - chain-of-thought - qwen - qwen3.6 - moe - distillation quantized_by: lordx64 license: apache-2.0 --- # Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled-GGUF GGUF quantizations of [`lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled`](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled) for use with [llama.cpp](https://github.com/ggerganov/llama.cpp) and [LM Studio](https://lmstudio.ai/). The base model is a reasoning-distilled variant of Qwen3.6-35B-A3B fine-tuned to imitate the chain-of-thought style of Claude Opus 4.7. It thinks in explicit `...` blocks before producing the final answer. ## Quant files See the file list for all available quant levels. Common choices: | File | Quant | Approx size | Use case | |---|---|---|---| | `*.IQ4_XS.gguf` | IQ4_XS | ~18 GB | Smallest quant with good quality — default pick for LM Studio | | `*.Q4_K_M.gguf` | Q4_K_M | ~21 GB | Balanced quality / size | | `*.Q5_K_M.gguf` | Q5_K_M | ~25 GB | Higher quality | | `*.Q8_0.gguf` | Q8_0 | ~35 GB | Near-lossless | ## Running in llama.cpp ```bash llama-server \ -m Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled.IQ4_XS.gguf \ --host 127.0.0.1 --port 18081 \ -c 32768 -fa on \ --cache-type-k q8_0 --cache-type-v turbo4 ``` ## Running in LM Studio Search for `lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled-GGUF` inside LM Studio's model browser and pick the quant that fits your RAM/VRAM. The model should appear automatically once HF indexes this repo. ## License Apache 2.0, inherited from the base model. See [`lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled`](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled) for training details, evaluations, and intended use.