Qwen3-4B

Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

The Qwen3-4B is a state-of-the-art multilingual base language model with 4 billion parameters, excelling in language understanding, generation, coding, and mathematics.

Model Conversion Contributor: carrycooldude

Model Stats:

  • Input sequence length for Prompt Processor: 128
  • Maximum context length: 4096
  • Quantization Type: w4a16 (4-bit weights with 16-bit activations)
  • Supported languages: 100+ languages and dialects.
  • TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
  • Response Rate: Rate of response generation after the first response token. Measured on a short prompt with a long response; may slow down when using longer context lengths.

Model Details

  • Type: Causal Language Models
  • Training Stage: Pretraining & Post-training
  • Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and GQA (Grouped Query Attention)
  • Number of Parameters: 4.0B
  • Number of Parameters (Non-Embedding): 3.6B
  • Context Length Support: Up to 4096 tokens (optimized for on-device)

For more details, please refer to the official Qwen3 Blog, GitHub, and Documentation.

Model Download

Model Chipset Target Runtime Precision Primary Compute Unit Target Model Performance
Qwen3-4B Snapdragon 8 Elite (QCS9075) QNN W4A16 NPU Qwen3-4B-onnx-w4a16.zip Check in AI Hub

Model Inference & Conversion

Using Qualcomm AI Hub

You can export and convert this model using Qualcomm AI Hub Models (minimum package version: 0.44.0):

# Install AI Hub Models
pip install qai-hub-models>=0.48.0

# Export the model with --zip-assets to generate the required format
python -m qai_hub_models.models.qwen3_4b.export --target-runtime genie --chipset qcs9075 --zip-assets --output-dir ./output

Note: Use the --zip-assets argument to ensure the model is saved in the required community repository format.

Repository Structure

Qwen3-4B/
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ .gitattributes
└── Qwen3-4B-onnx-w4a16.zip

ONNX Export (Internal structure)

Qwen3-4B_onnx_w4a16/
β”œβ”€β”€ tool_versions.yaml
β”œβ”€β”€ model.onnx
β”œβ”€β”€ model.data
β”œβ”€β”€ model.encodings
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ tokenizer_config.json
└── ...

tool_versions.yaml

tool_versions:
  aihm_version: 0.48.0
  qairt: 2.34.0

License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for qualcomm-ai-hub-community/Qwen3-4B-Instruct-carrycooldude

Finetuned
Qwen/Qwen3-4B
Finetuned
(572)
this model