prithivMLmods's picture
Update README.md
d32c1fd verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen3-4B-Instruct-2507
language:
  - en
pipeline_tag: text-generation
library_name: transformers
tags:
  - text-generation-inference
  - it
  - llama.cpp

Qwen3-4B-Instruct-2507-GGUF

Qwen3-4B-Instruct-2507 is a 4-billion-parameter causal language model designed for advanced instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, with significant improvements in these general capabilities and substantial gains in long-tail knowledge coverage across multiple languages. It operates in a non-thinking mode without generating explicit reasoning step tags, features 36 layers, 32 query and 8 key-value attention heads using GQA, and supports an extremely long native context length of 262,144 tokens.

The model is pretrained and post-trained for enhanced performance and alignment with user preferences, excelling in subjective and open-ended tasks with more helpful and higher-quality responses. It can be deployed efficiently via popular toolkits such as Hugging Face Transformers, SGLang, and vLLM, and supports extensive long-context understanding suitable for complex tasks. Code examples, benchmark evaluations, deployment instructions, and agentic tool-calling capabilities with Qwen-Agent are fully documented in its official repository on Hugging Face.

Model Files

File Name Size Quant Type
Qwen3-4B-Thinking-2507.BF16.gguf 8.05 GB BF16
Qwen3-4B-Thinking-2507.F16.gguf 8.05 GB F16
Qwen3-4B-Thinking-2507.F32.gguf 16.1 GB F32
Qwen3-4B-Thinking-2507.Q2_K.gguf 1.67 GB Q2_K
Qwen3-4B-Thinking-2507.Q3_K_L.gguf 2.24 GB Q3_K_L
Qwen3-4B-Thinking-2507.Q3_K_M.gguf 2.08 GB Q3_K_M
Qwen3-4B-Thinking-2507.Q3_K_S.gguf 1.89 GB Q3_K_S
Qwen3-4B-Thinking-2507.Q4_K_M.gguf 2.5 GB Q4_K_M
Qwen3-4B-Thinking-2507.Q4_K_S.gguf 2.38 GB Q4_K_S
Qwen3-4B-Thinking-2507.Q5_K_M.gguf 2.89 GB Q5_K_M
Qwen3-4B-Thinking-2507.Q5_K_S.gguf 2.82 GB Q5_K_S
Qwen3-4B-Thinking-2507.Q6_K.gguf 3.31 GB Q6_K
Qwen3-4B-Thinking-2507.Q8_0.gguf 4.28 GB Q8_0

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png