File size: 4,242 Bytes
6e40c3a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
language:
- en
- zh
- fr
- es
- pt
- de
- it
- ru
- ja
- ko
- vi
- th
- ar
license: apache-2.0
library_name: transformers
base_model:
- Qwen/Qwen3-4B
tags:
- qwen
- qwen3
- causal-lm
- qualcomm
- ai-hub
- on-device
- onnx
- qnn
pipeline_tag: text-generation
---

## Qwen3-4B

Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

The **Qwen3-4B** is a state-of-the-art multilingual base language model with 4 billion parameters, excelling in language understanding, generation, coding, and mathematics.

**Model Conversion Contributor**: [carrycooldude](https://github.com/carrycooldude)

**Model Stats:**
- Input sequence length for Prompt Processor: 128
- Maximum context length: 4096
- Quantization Type: w4a16 (4-bit weights with 16-bit activations)
- Supported languages: 100+ languages and dialects.
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
- Response Rate: Rate of response generation after the first response token. Measured on a short prompt with a long response; may slow down when using longer context lengths.

## Model Details

- **Type**: Causal Language Models
- **Training Stage**: Pretraining & Post-training
- **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, and GQA (Grouped Query Attention)
- **Number of Parameters**: 4.0B
- **Number of Parameters (Non-Embedding)**: 3.6B
- **Context Length Support**: Up to 4096 tokens (optimized for on-device)

For more details, please refer to the official [Qwen3 Blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).

## Model Download

| Model | Chipset | Target Runtime | Precision | Primary Compute Unit | Target Model | Performance |
|-------|---------|---------------|-----------|---------------------|-------------|-------------|
| Qwen3-4B | Snapdragon 8 Elite (QCS9075) | QNN | W4A16 | NPU | [Qwen3-4B-onnx-w4a16.zip](./Qwen3-4B-onnx-w4a16.zip) | [Check in AI Hub](https://aihub.qualcomm.com/) |

## Model Inference & Conversion

### Using Qualcomm AI Hub

You can export and convert this model using [Qualcomm AI Hub Models](https://github.com/quic/ai-hub-models) (minimum package version: 0.44.0):

```bash
# Install AI Hub Models
pip install qai-hub-models>=0.48.0

# Export the model with --zip-assets to generate the required format
python -m qai_hub_models.models.qwen3_4b.export --target-runtime genie --chipset qcs9075 --zip-assets --output-dir ./output
```

> **Note**: Use the `--zip-assets` argument to ensure the model is saved in the required community repository format.

## Repository Structure

```
Qwen3-4B/
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ .gitattributes
└── Qwen3-4B-onnx-w4a16.zip
```

### ONNX Export (Internal structure)

```
Qwen3-4B_onnx_w4a16/
β”œβ”€β”€ tool_versions.yaml
β”œβ”€β”€ model.onnx
β”œβ”€β”€ model.data
β”œβ”€β”€ model.encodings
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ tokenizer_config.json
└── ...
```

### tool_versions.yaml

```yaml
tool_versions:
  aihm_version: 0.48.0
  qairt: 2.34.0
```

## License

- **Source Model**: [APACHE-2.0](https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE)
- **Deployable Model**: [APACHE-2.0](https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE)

## Disclaimer

This is a community contribution. The models hosted here are user contributions and:
- Are not verified by the organization or maintainers for correctness, safety, or performance.
- May contain errors, bugs, or limitations.
- Are moderated only for structural compliance, not for content quality.

The organization and maintainers do not take responsibility for the models or assets contributed here. Use them at your own discretion.