Qwen3-4B

Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

The Qwen3-4B is a state-of-the-art multilingual base language model with 4 billion parameters, excelling in language understanding, generation, coding, and mathematics.

Model Conversion Contributor: carrycooldude

Model Stats:

Input sequence length for Prompt Processor: 128
Maximum context length: 4096
Quantization Type: w4a16 (4-bit weights with 16-bit activations)
Supported languages: 100+ languages and dialects.
TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
Response Rate: Rate of response generation after the first response token. Measured on a short prompt with a long response; may slow down when using longer context lengths.

Model Details

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and GQA (Grouped Query Attention)
Number of Parameters: 4.0B
Number of Parameters (Non-Embedding): 3.6B
Context Length Support: Up to 4096 tokens (optimized for on-device)

For more details, please refer to the official Qwen3 Blog, GitHub, and Documentation.

Model Download

Model	Chipset	Target Runtime	Precision	Primary Compute Unit	Target Model	Performance
Qwen3-4B	Snapdragon 8 Elite (QCS9075)	QNN	W4A16	NPU	Qwen3-4B-onnx-w4a16.zip	Check in AI Hub

Model Inference & Conversion

Using Qualcomm AI Hub

You can export and convert this model using Qualcomm AI Hub Models (minimum package version: 0.44.0):

# Install AI Hub Models
pip install qai-hub-models>=0.48.0

# Export the model with --zip-assets to generate the required format
python -m qai_hub_models.models.qwen3_4b.export --target-runtime genie --chipset qcs9075 --zip-assets --output-dir ./output

Note: Use the --zip-assets argument to ensure the model is saved in the required community repository format.

Repository Structure

Qwen3-4B/
├── LICENSE
├── README.md
├── .gitattributes
└── Qwen3-4B-onnx-w4a16.zip

ONNX Export (Internal structure)

Qwen3-4B_onnx_w4a16/
├── tool_versions.yaml
├── model.onnx
├── model.data
├── model.encodings
├── tokenizer.json
├── tokenizer_config.json
└── ...

tool_versions.yaml

tool_versions:
  aihm_version: 0.48.0
  qairt: 2.34.0

License

Source Model: APACHE-2.0
Deployable Model: APACHE-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for qualcomm-ai-hub-community/Qwen3-4B-Instruct-carrycooldude

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(572)

this model