AxionML Qwen3.5-27B-NVFP4 GGUF
This repository contains a GGUF conversion of AxionML/Qwen3.5-27B-NVFP4 for llama.cpp-compatible runtimes such as LM Studio.
Files
AxionML-Qwen3.5-27B-NVFP4.ggufβ main modelmmproj-BF16.ggufβ multimodal projector for image support
Notes
- The main GGUF was converted from the Hugging Face NVFP4 checkpoint using
convert_hf_to_gguf.py - Conversion was performed with
--outtype bf16, producing a mixed-format GGUF with preserved supported tensor types and required floating-point auxiliary tensors mmproj-BF16.ggufis used for image support- The
mmproj-BF16.gguffile was sourced from unsloth/Qwen3.5-27B-GGUF and verified working in local testing - Original source model: AxionML/Qwen3.5-27B-NVFP4
- Base model family:
Qwen/Qwen3.5-27B
License
Please refer to the upstream model license and attribution requirements.
π Qwen 3.5 vs Qwen 3 Benchmark Overview
Higher is better.
This repository is based on Qwen3.5-27B, one of the strongest balanced models in the Qwen family.
| Model | Knowledge & STEM | Instruction Following | Long Context | Math | Coding | General Agent | Multilingualism |
|---|---|---|---|---|---|---|---|
| Qwen3-235B-A22B | 83 | 63 | 57 | 87 | 54 | 56 | 75 |
| Qwen3.5-122B-A10B | 85 | 76 | 63 | 91 | 59 | 75 | 79 |
| Qwen3-Next-80B-A3B-Thinking | 80 | 67 | 50 | 77 | 49 | 53 | 71 |
| Qwen3.5-35B-A3B | 84 | 74 | 58 | 89 | 55 | 74 | 77 |
| Qwen3-30BA3B-Thinking-2507 | 78 | 62 | 47 | 68 | 46 | 42 | 69 |
| Qwen3.5-27B | 84 | 77 | 63 | 91 | 60 | 74 | 79 |
| Qwen3.5-9B | 80 | 70 | 59 | 83 | 47 | 73 | 73 |
| Qwen3.5-4B | 76 | 66 | 53 | 75 | 40 | 64 | 68 |
| Qwen3-4B-2507 | 72 | 59 | 37 | 63 | N/A | 41 | 61 |
| Qwen3.5-2B | 64 | 51 | 32 | 21 | N/A | 46 | 52 |
| Qwen3-1.7B | 57 | 42 | 17 | 9 | N/A | 18 | 47 |
| Qwen3.5-0.8B | 43 | 28 | 16 | N/A | N/A | N/A | 37 |
Benchmark note: The comparison table above is a community summary / visualization, with the visual overview based on this Reddit post: Visualizing all Qwen 3.5 vs Qwen 3 benchmarks
For official benchmark details from the Qwen team, see the benchmark section of: Qwen/Qwen3.5-27B
This repository is a GGUF conversion of AxionML/Qwen3.5-27B-NVFP4, stored as a mixed-format GGUF with native NVFP4 weights plus floating-point auxiliary tensors where required by the conversion/runtime.
Quantized checkpoints may show small quality differences relative to their base model on benchmark results.
β¨ Why Qwen3.5-27B stands out
Qwen3.5-27B delivers one of the strongest overall quality-to-size tradeoffs in the entire Qwen family.
Key highlights:
- Best reported coding score in this comparison with 60, ahead of every other listed model with a published coding result
- Top-tier math performance with 91, matching the strongest model in the table
- Excellent instruction following with 77
- Strong multilingual capability with 79
- Very strong long-context performance with 63, while remaining practical to run locally
In short, Qwen3.5-27B is a particularly compelling choice for users who want strong coding ability, excellent reasoning, large-context usability, and multilingual performance without stepping up to the largest flagship models.
π Practical local performance
In local testing on an NVIDIA GeForce RTX 5090 32GB, this GGUF build sustains 50+ tok/s across 80Kβ96K context window.
That makes it especially attractive for:
- long-document analysis
- large codebase work
- multi-file reasoning
- extended chat sessions
- retrieval-heavy workflows
Performance note: this is a local test result on RTX 5090 32GB hardware. Actual throughput will vary depending on runtime version, context length, batch settings, prompt shape, and sampling configuration.
π₯ At a glance
Qwen3.5-27B combines:
- Best reported coding score in this comparison
- Top-tier math performance
- Strong long-context capability
- Excellent multilingual and instruction-following performance
- 50+ tok/s at 96K context in local RTX 5090 32GB testing
That makes this GGUF release a particularly strong option for users who want a model that is both high quality and practical to run locally.
- Downloads last month
- 1,410
4-bit