QuickThinker Qwen3.5-27B Vision GGUF

This release is part of the QuickThinker Series: a line of fine-tuned Qwen models focused on lower thinking-token usage, faster thinking, lower looping, cleaner stopping behavior, stronger prompt adherence, local tool chains, local tool use, and practical local use. These models are intentionally more direct and to the point.

QuickThinker Qwen3.5-27B Vision GGUF is based on Qwen/Qwen3.5-27B and packaged for local multimodal inference in GGUF format. The QuickThinker Series is built for local and quick inference, aiming to preserve the quality of the base models while keeping thinking enabled and still reducing the thinking-token budget by roughly 60 to 70 percent.

Base Model

This model is based on:

  • Qwen/Qwen3.5-27B

What This Release Tries To Improve

  • lower looping behavior
  • fewer wasted thinking tokens
  • stronger prompt adherence on structured tasks
  • better handling of contradictions, underdetermined prompts, and insufficient-information cases
  • more stable local assistant behavior
  • better tool calling for tools like OpenCode and Osaurus

Training Style

This release comes from the current final FineVine rebuild dataset, a custom curated dataset emphasizing:

  • concise but still substantive answers
  • cleaner stopping behavior
  • practical coding and reasoning tasks
  • image interpretation quality
  • political consistency grounding

The majority of the final training data is custom-curated and edited.

Included Files

This package currently contains:

  • QuickThinker-Qwen3.5-27B-Vision-Q4_K_M.gguf
  • QuickThinker-Qwen3.5-27B-Vision-Q6_K.gguf
  • QuickThinker-Qwen3.5-27B-Vision-Q8_0.gguf
  • QuickThinker-Qwen3.5-27B-Vision-f16.gguf
  • QuickThinker-Qwen3.5-27B-Vision-mmproj-f16.gguf

Important GGUF Note

For multimodal use in llama.cpp, you will typically need:

  • one language-model GGUF
  • the matching mmproj GGUF

Suggested Parameters

  • temperature = 0.6
  • top_p = 0.95
  • top_k = 20
  • min_p = 0.0
  • presence_penalty = 0.0
  • repetition_penalty = 1.0

Intended Use

This release is meant for:

  • local multimodal assistant use
  • direct-answer tasks
  • practical coding help
  • structured reasoning

Important Note

This model is not intended as a warm, roleplay-first chatbot. It is tuned more for directness, bounded reasoning, and practical usefulness.

This model is not a good fit for:

  • storytelling
  • role playing
Downloads last month
412
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pt-ml/QuickThinker-Qwen3.5-27B-Vision-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(195)
this model