| --- |
| language: |
| - en |
| - zh |
| - fr |
| - es |
| - pt |
| - de |
| - it |
| - ru |
| - ja |
| - ko |
| - vi |
| - th |
| - ar |
| license: apache-2.0 |
| library_name: transformers |
| base_model: |
| - Qwen/Qwen3-4B |
| tags: |
| - qwen |
| - qwen3 |
| - causal-lm |
| - qualcomm |
| - ai-hub |
| - on-device |
| - onnx |
| - qnn |
| pipeline_tag: text-generation |
| --- |
| |
| ## Qwen3-4B |
|
|
| Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. |
|
|
| The **Qwen3-4B** is a state-of-the-art multilingual base language model with 4 billion parameters, excelling in language understanding, generation, coding, and mathematics. |
|
|
| **Model Conversion Contributor**: [carrycooldude](https://github.com/carrycooldude) |
|
|
| **Model Stats:** |
| - Input sequence length for Prompt Processor: 128 |
| - Maximum context length: 4096 |
| - Quantization Type: w4a16 (4-bit weights with 16-bit activations) |
| - Supported languages: 100+ languages and dialects. |
| - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). |
| - Response Rate: Rate of response generation after the first response token. Measured on a short prompt with a long response; may slow down when using longer context lengths. |
|
|
| ## Model Details |
|
|
| - **Type**: Causal Language Models |
| - **Training Stage**: Pretraining & Post-training |
| - **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, and GQA (Grouped Query Attention) |
| - **Number of Parameters**: 4.0B |
| - **Number of Parameters (Non-Embedding)**: 3.6B |
| - **Context Length Support**: Up to 4096 tokens (optimized for on-device) |
|
|
| For more details, please refer to the official [Qwen3 Blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/). |
|
|
| ## Model Download |
|
|
| | Model | Chipset | Target Runtime | Precision | Primary Compute Unit | Target Model | Performance | |
| |-------|---------|---------------|-----------|---------------------|-------------|-------------| |
| | Qwen3-4B | Snapdragon 8 Elite (QCS9075) | QNN | W4A16 | NPU | [Qwen3-4B-onnx-w4a16.zip](./Qwen3-4B-onnx-w4a16.zip) | [Check in AI Hub](https://aihub.qualcomm.com/) | |
|
|
| ## Model Inference & Conversion |
|
|
| ### Using Qualcomm AI Hub |
|
|
| You can export and convert this model using [Qualcomm AI Hub Models](https://github.com/quic/ai-hub-models) (minimum package version: 0.44.0): |
|
|
| ```bash |
| # Install AI Hub Models |
| pip install qai-hub-models>=0.48.0 |
| |
| # Export the model with --zip-assets to generate the required format |
| python -m qai_hub_models.models.qwen3_4b.export --target-runtime genie --chipset qcs9075 --zip-assets --output-dir ./output |
| ``` |
|
|
| > **Note**: Use the `--zip-assets` argument to ensure the model is saved in the required community repository format. |
|
|
| ## Repository Structure |
|
|
| ``` |
| Qwen3-4B/ |
| βββ LICENSE |
| βββ README.md |
| βββ .gitattributes |
| βββ Qwen3-4B-onnx-w4a16.zip |
| ``` |
|
|
| ### ONNX Export (Internal structure) |
|
|
| ``` |
| Qwen3-4B_onnx_w4a16/ |
| βββ tool_versions.yaml |
| βββ model.onnx |
| βββ model.data |
| βββ model.encodings |
| βββ tokenizer.json |
| βββ tokenizer_config.json |
| βββ ... |
| ``` |
|
|
| ### tool_versions.yaml |
| |
| ```yaml |
| tool_versions: |
| aihm_version: 0.48.0 |
| qairt: 2.34.0 |
| ``` |
| |
| ## License |
| |
| - **Source Model**: [APACHE-2.0](https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE) |
| - **Deployable Model**: [APACHE-2.0](https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE) |
| |
| ## Disclaimer |
| |
| This is a community contribution. The models hosted here are user contributions and: |
| - Are not verified by the organization or maintainers for correctness, safety, or performance. |
| - May contain errors, bugs, or limitations. |
| - Are moderated only for structural compliance, not for content quality. |
| |
| The organization and maintainers do not take responsibility for the models or assets contributed here. Use them at your own discretion. |
| |