AxionML Qwen3.5-27B-NVFP4 GGUF

This repository contains a GGUF conversion of AxionML/Qwen3.5-27B-NVFP4 for llama.cpp-compatible runtimes such as LM Studio.

Files

  • AxionML-Qwen3.5-27B-NVFP4.gguf β€” main model
  • mmproj-BF16.gguf β€” multimodal projector for image support

Notes

  • The main GGUF was converted from the Hugging Face NVFP4 checkpoint using convert_hf_to_gguf.py
  • Conversion was performed with --outtype bf16, producing a mixed-format GGUF with preserved supported tensor types and required floating-point auxiliary tensors
  • mmproj-BF16.gguf is used for image support
  • The mmproj-BF16.gguf file was sourced from unsloth/Qwen3.5-27B-GGUF and verified working in local testing
  • Original source model: AxionML/Qwen3.5-27B-NVFP4
  • Base model family: Qwen/Qwen3.5-27B

License

Please refer to the upstream model license and attribution requirements.


πŸ“Š Qwen 3.5 vs Qwen 3 Benchmark Overview

Higher is better.
This repository is based on Qwen3.5-27B, one of the strongest balanced models in the Qwen family.

Model Knowledge & STEM Instruction Following Long Context Math Coding General Agent Multilingualism
Qwen3-235B-A22B 83 63 57 87 54 56 75
Qwen3.5-122B-A10B 85 76 63 91 59 75 79
Qwen3-Next-80B-A3B-Thinking 80 67 50 77 49 53 71
Qwen3.5-35B-A3B 84 74 58 89 55 74 77
Qwen3-30BA3B-Thinking-2507 78 62 47 68 46 42 69
Qwen3.5-27B 84 77 63 91 60 74 79
Qwen3.5-9B 80 70 59 83 47 73 73
Qwen3.5-4B 76 66 53 75 40 64 68
Qwen3-4B-2507 72 59 37 63 N/A 41 61
Qwen3.5-2B 64 51 32 21 N/A 46 52
Qwen3-1.7B 57 42 17 9 N/A 18 47
Qwen3.5-0.8B 43 28 16 N/A N/A N/A 37

Benchmark note: The comparison table above is a community summary / visualization, with the visual overview based on this Reddit post: Visualizing all Qwen 3.5 vs Qwen 3 benchmarks

For official benchmark details from the Qwen team, see the benchmark section of: Qwen/Qwen3.5-27B

This repository is a GGUF conversion of AxionML/Qwen3.5-27B-NVFP4, stored as a mixed-format GGUF with native NVFP4 weights plus floating-point auxiliary tensors where required by the conversion/runtime.

Quantized checkpoints may show small quality differences relative to their base model on benchmark results.

✨ Why Qwen3.5-27B stands out

Qwen3.5-27B delivers one of the strongest overall quality-to-size tradeoffs in the entire Qwen family.

Key highlights:

  • Best reported coding score in this comparison with 60, ahead of every other listed model with a published coding result
  • Top-tier math performance with 91, matching the strongest model in the table
  • Excellent instruction following with 77
  • Strong multilingual capability with 79
  • Very strong long-context performance with 63, while remaining practical to run locally

In short, Qwen3.5-27B is a particularly compelling choice for users who want strong coding ability, excellent reasoning, large-context usability, and multilingual performance without stepping up to the largest flagship models.

πŸš€ Practical local performance

In local testing on an NVIDIA GeForce RTX 5090 32GB, this GGUF build sustains 50+ tok/s across 80K–96K context window.

That makes it especially attractive for:

  • long-document analysis
  • large codebase work
  • multi-file reasoning
  • extended chat sessions
  • retrieval-heavy workflows

Performance note: this is a local test result on RTX 5090 32GB hardware. Actual throughput will vary depending on runtime version, context length, batch settings, prompt shape, and sampling configuration.

πŸ”₯ At a glance

Qwen3.5-27B combines:

  • Best reported coding score in this comparison
  • Top-tier math performance
  • Strong long-context capability
  • Excellent multilingual and instruction-following performance
  • 50+ tok/s at 96K context in local RTX 5090 32GB testing

That makes this GGUF release a particularly strong option for users who want a model that is both high quality and practical to run locally.


Downloads last month
1,410
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Freenixi/AxionML-Qwen3.5-27B-NVFP4-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(1)
this model