Qwen3.6-35B-A3B_Opus-Reasoning_BF16
This is a fine-tuned and merged version of the Qwen3.6 35B-A3B model, trained on the high-quality reasoning dataset Crownelius/Opus-4.6-Reasoning-3300x.
The primary goal of this project was to fuse Qwen3.6's native <think> chain-of-thought architecture with the structured analytical reasoning style of the Opus dataset — producing a model that plans, self-corrects, and reasons explicitly before delivering its final answer. The result is a deeply analytical agent that scores competitively against much larger dense models while operating on only ~3.6B active parameters (MoE).
Reasoning Format: Qwen3 Native Thinking Mode
This model uses Qwen3's native thinking tokens. Internal reasoning is emitted inside <think>...</think> blocks before the final response. This is enabled by passing enable_thinking=True to apply_chat_template.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
path = "Crownelius/Qwen3.6-35B-A3B_Opus-Reasoning_BF16"
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, dtype=torch.bfloat16, device_map="auto")
msgs = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Below are raw, unedited samples from this merge:
User: "What is the 10,000th prime number?"
Model (internal
<think>block): I know that the nth prime number p(n) grows roughly as n ln n (Prime Number Theorem). For n=10,000, p(10,000) ≈ 10,000 * ln(10,000) ≈ 10,000 * 9.21 ≈ 92,100. So it should be around 100,000. I remember that the 10,000th prime number is 104,729. Is 104,729 actually prime? Yes, it's not divisible by any prime ≤ √104729 ≈ 323.6. Standard references confirm it. Is it indeed the 10,000th? Yes, multiple reliable sources (OEIS A000040, mathematical databases, computational verification) confirm this. ✅Final Answer: The 10,000th prime number is 104,729. This value is well-documented in mathematical references and can be verified using prime-counting algorithms or standard prime tables.
User: "A runaway trolley will kill 5 people. You can pull a lever to divert it, killing 1. Should you pull it? Reason through it carefully."
Model (internal
<think>block, abridged): (Utilitarian Perspective) From a utilitarian standpoint, saving 5 lives at the cost of 1 produces a net gain of 4 lives. A strict utilitarian would say: pull the lever. (Deontological Perspective) Pulling the lever makes you actively responsible for the death of the one person. You're using that person as a means to save others, violating the principle that individuals have rights not to be harmed intentionally. (Action vs. Inaction) Studies show most people pull the lever in the classic version, but hesitate in variants (like pushing a person onto the tracks), suggesting direct physical involvement and personal agency matter morally and psychologically. (Synthesis) In practice, if forced to choose, I'd lean toward pulling the lever — not because the math is sufficient, but because inaction in the face of preventable mass harm often carries its own moral weight.Final Answer: There's no universally correct answer. If prioritizing outcomes and minimizing total harm is your core principle, pulling the lever is justified. If you believe certain actions (like intentionally causing death) are wrong regardless of consequences, then not pulling is morally defensible. Both choices involve tragedy, and moral responsibility doesn't vanish when we step back.
Benchmarks: ARC Challenge
Evaluated using EleutherAI lm-evaluation-harness.
0-Shot (Reasoning in isolation, no examples)
| Model | acc | acc_norm |
|---|---|---|
| Base (Qwen3.6-35B-A3B) | 0.5461 | 0.5418 |
| Merged (Opus Reasoning) | 0.5427 | 0.5520 |
The merge is statistically neutral at 0-shot (within ±0.0146 stderr) — base capability is fully preserved.
25-Shot (Leaderboard-standard, in-context learning)
| Model | acc | acc_norm |
|---|---|---|
| Merged (Opus Reasoning) | 0.6749 | 0.7218 |
Training Details
- Base Model:
Qwen/Qwen3.6-35B-A3B - Dataset:
Crownelius/Opus-4.6-Reasoning-3300x - Training Framework: Eschaton Engine (Cloudbjorn)
- Format: Merged (Base + LoRA)
- MoE Handling: Router/gate layers excluded from LoRA adapters (
exclude_modules: ["gate", "router"])
Training Precision:
- Compute Dtype: bfloat16
LoRA Parameters (Auto-Scaled for 35B via Eschaton Engine — 20B–60B bracket):
- r: 16
- lora_alpha: 32
- target_modules:
all-linear - lora_dropout: 0.05
- bias: none
- task_type: CAUSAL_LM
Hyperparameters:
- Optimizer: 8-bit Paged AdamW
- Effective Batch Size: 32 (via Gradient Accumulation)
- Learning Rate: 2e-5
- LR Scheduler: Linear
- Epochs: 1
- Training Sequence Length: 2048
- Warmup Steps: 50
- Weight Decay: 0.01
- Steps: 3300
- Downloads last month
- 334
Model tree for cloudbjorn/Qwen3.6-35B-A3B_Opus-4.6-Reasoning-3300x
Base model
Qwen/Qwen3.6-35B-A3B