Qwen3-4B
Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
The Qwen3-4B is a state-of-the-art multilingual base language model with 4 billion parameters, excelling in language understanding, generation, coding, and mathematics.
Model Conversion Contributor: carrycooldude
Model Stats:
- Input sequence length for Prompt Processor: 128
- Maximum context length: 4096
- Quantization Type: w4a16 (4-bit weights with 16-bit activations)
- Supported languages: 100+ languages and dialects.
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
- Response Rate: Rate of response generation after the first response token. Measured on a short prompt with a long response; may slow down when using longer context lengths.
Model Details
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and GQA (Grouped Query Attention)
- Number of Parameters: 4.0B
- Number of Parameters (Non-Embedding): 3.6B
- Context Length Support: Up to 4096 tokens (optimized for on-device)
For more details, please refer to the official Qwen3 Blog, GitHub, and Documentation.
Model Download
| Model | Chipset | Target Runtime | Precision | Primary Compute Unit | Target Model | Performance |
|---|---|---|---|---|---|---|
| Qwen3-4B | Snapdragon 8 Elite (QCS9075) | QNN | W4A16 | NPU | Qwen3-4B-onnx-w4a16.zip | Check in AI Hub |
Model Inference & Conversion
Using Qualcomm AI Hub
You can export and convert this model using Qualcomm AI Hub Models (minimum package version: 0.44.0):
# Install AI Hub Models
pip install qai-hub-models>=0.48.0
# Export the model with --zip-assets to generate the required format
python -m qai_hub_models.models.qwen3_4b.export --target-runtime genie --chipset qcs9075 --zip-assets --output-dir ./output
Note: Use the
--zip-assetsargument to ensure the model is saved in the required community repository format.
Repository Structure
Qwen3-4B/
βββ LICENSE
βββ README.md
βββ .gitattributes
βββ Qwen3-4B-onnx-w4a16.zip
ONNX Export (Internal structure)
Qwen3-4B_onnx_w4a16/
βββ tool_versions.yaml
βββ model.onnx
βββ model.data
βββ model.encodings
βββ tokenizer.json
βββ tokenizer_config.json
βββ ...
tool_versions.yaml
tool_versions:
aihm_version: 0.48.0
qairt: 2.34.0
License
- Source Model: APACHE-2.0
- Deployable Model: APACHE-2.0