| --- |
| license: apache-2.0 |
| language: |
| - en |
| - zh |
| base_model: Qwen/Qwen3-8B |
| pipeline_tag: text-generation |
| tags: |
| - language model |
| - parallel-decoding |
| --- |
| |
| # WeDLM-8B |
|
|
| **WeDLM-8B** is a diffusion language model that performs parallel decoding under standard causal attention, initialized from [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). |
|
|
| This is the **base (pretrained)** version. For the instruction-tuned version, see [WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct). |
|
|
| ๐ Paper (Coming Soon) | ๐ [Project Page](https://wedlm.github.io) | ๐ป [GitHub](https://github.com/tencent/WeDLM) |
|
|
| ## Model Details |
|
|
| | Attribute | Value | |
| |:----------|:------| |
| | Initialized From | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | |
| | Parameters | 8B | |
| | Context Length | 32,768 | |
|
|
| ## Quick Start (Recommended) |
|
|
| For **fast inference**, use the `wedlm` engine: |
|
|
| ```bash |
| pip install git+https://github.com/tencent/WeDLM.git |
| ``` |
|
|
| ```python |
| from wedlm import LLM, SamplingParams |
| |
| llm = LLM(model="tencent/WeDLM-8B") |
| |
| prompt = "The theory of relativity states that" |
| outputs = llm.generate([prompt], SamplingParams(max_tokens=256)) |
| |
| print(outputs[0]["text"]) |
| ``` |
|
|
| ## HuggingFace Transformers |
|
|
| For **training** or simple forward passes, you can load via Transformers: |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-8B", trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| "tencent/WeDLM-8B", |
| trust_remote_code=True, |
| torch_dtype="auto", |
| device_map="auto" |
| ) |
| |
| inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device) |
| outputs = model(**inputs) |
| ``` |
|
|
| > โ ๏ธ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above. |
|
|
| ## Performance |
|
|
| | Benchmark | Qwen3-8B | WeDLM-8B | |
| |:----------|:--------:|:--------:| |
| | ARC-C (0-shot) | 92.66 | **92.92** | |
| | GSM8K (3-shot) | 85.97 | **90.20** | |
| | MATH (4-shot) | 50.80 | **53.60** | |
| | HumanEval (4-shot) | 68.90 | **75.00** | |
| | MMLU (5-shot) | 74.03 | **75.46** | |
| | **Average** | 72.61 | **74.72** | |
|
|
| ## Citation (Coming soon) |
|
|
|
|
| ## License |
|
|
| Apache 2.0 |