Upload folder using huggingface_hub

6e40c3a verified about 2 months ago

4.24 kB

	---
	language:
	- en
	- zh
	- fr
	- es
	- pt
	- de
	- it
	- ru
	- ja
	- ko
	- vi
	- th
	- ar
	license: apache-2.0
	library_name: transformers
	base_model:
	- Qwen/Qwen3-4B
	tags:
	- qwen
	- qwen3
	- causal-lm
	- qualcomm
	- ai-hub
	- on-device
	- onnx
	- qnn
	pipeline_tag: text-generation
	---

	## Qwen3-4B

	Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

	The Qwen3-4B is a state-of-the-art multilingual base language model with 4 billion parameters, excelling in language understanding, generation, coding, and mathematics.

	Model Conversion Contributor: [carrycooldude](https://github.com/carrycooldude)

	Model Stats:
	- Input sequence length for Prompt Processor: 128
	- Maximum context length: 4096
	- Quantization Type: w4a16 (4-bit weights with 16-bit activations)
	- Supported languages: 100+ languages and dialects.
	- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
	- Response Rate: Rate of response generation after the first response token. Measured on a short prompt with a long response; may slow down when using longer context lengths.

	## Model Details

	- Type: Causal Language Models
	- Training Stage: Pretraining & Post-training
	- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and GQA (Grouped Query Attention)
	- Number of Parameters: 4.0B
	- Number of Parameters (Non-Embedding): 3.6B
	- Context Length Support: Up to 4096 tokens (optimized for on-device)

	For more details, please refer to the official [Qwen3 Blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).

	## Model Download

	\| Model \| Chipset \| Target Runtime \| Precision \| Primary Compute Unit \| Target Model \| Performance \|
	\|-------\|---------\|---------------\|-----------\|---------------------\|-------------\|-------------\|
	\| Qwen3-4B \| Snapdragon 8 Elite (QCS9075) \| QNN \| W4A16 \| NPU \| [Qwen3-4B-onnx-w4a16.zip](./Qwen3-4B-onnx-w4a16.zip) \| [Check in AI Hub](https://aihub.qualcomm.com/) \|

	## Model Inference & Conversion

	### Using Qualcomm AI Hub

	You can export and convert this model using [Qualcomm AI Hub Models](https://github.com/quic/ai-hub-models) (minimum package version: 0.44.0):

	```bash
	# Install AI Hub Models
	pip install qai-hub-models>=0.48.0

	# Export the model with --zip-assets to generate the required format
	python -m qai_hub_models.models.qwen3_4b.export --target-runtime genie --chipset qcs9075 --zip-assets --output-dir ./output
	```

	> Note: Use the `--zip-assets` argument to ensure the model is saved in the required community repository format.

	## Repository Structure

	```
	Qwen3-4B/
	├── LICENSE
	├── README.md
	├── .gitattributes
	└── Qwen3-4B-onnx-w4a16.zip
	```

	### ONNX Export (Internal structure)

	```
	Qwen3-4B_onnx_w4a16/
	├── tool_versions.yaml
	├── model.onnx
	├── model.data
	├── model.encodings
	├── tokenizer.json
	├── tokenizer_config.json
	└── ...
	```

	### tool_versions.yaml

	```yaml
	tool_versions:
	aihm_version: 0.48.0
	qairt: 2.34.0
	```

	## License

	- Source Model: [APACHE-2.0](https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE)
	- Deployable Model: [APACHE-2.0](https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE)

	## Disclaimer

	This is a community contribution. The models hosted here are user contributions and:
	- Are not verified by the organization or maintainers for correctness, safety, or performance.
	- May contain errors, bugs, or limitations.
	- Are moderated only for structural compliance, not for content quality.

	The organization and maintainers do not take responsibility for the models or assets contributed here. Use them at your own discretion.