Qwen3.6-35B-A3B_Opus-Reasoning_BF16

This is a fine-tuned and merged version of the Qwen3.6 35B-A3B model, trained on the high-quality reasoning dataset Crownelius/Opus-4.6-Reasoning-3300x.

The primary goal of this project was to fuse Qwen3.6's native <think> chain-of-thought architecture with the structured analytical reasoning style of the Opus dataset — producing a model that plans, self-corrects, and reasons explicitly before delivering its final answer. The result is a deeply analytical agent that scores competitively against much larger dense models while operating on only ~3.6B active parameters (MoE).

Reasoning Format: Qwen3 Native Thinking Mode

This model uses Qwen3's native thinking tokens. Internal reasoning is emitted inside <think>...</think> blocks before the final response. This is enabled by passing enable_thinking=True to apply_chat_template.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

path = "Crownelius/Qwen3.6-35B-A3B_Opus-Reasoning_BF16"
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, dtype=torch.bfloat16, device_map="auto")

msgs = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Below are raw, unedited samples from this merge:

User: "What is the 10,000th prime number?"

Model (internal <think> block): I know that the nth prime number p(n) grows roughly as n ln n (Prime Number Theorem). For n=10,000, p(10,000) ≈ 10,000 * ln(10,000) ≈ 10,000 * 9.21 ≈ 92,100. So it should be around 100,000. I remember that the 10,000th prime number is 104,729. Is 104,729 actually prime? Yes, it's not divisible by any prime ≤ √104729 ≈ 323.6. Standard references confirm it. Is it indeed the 10,000th? Yes, multiple reliable sources (OEIS A000040, mathematical databases, computational verification) confirm this. ✅

Final Answer: The 10,000th prime number is 104,729. This value is well-documented in mathematical references and can be verified using prime-counting algorithms or standard prime tables.

User: "A runaway trolley will kill 5 people. You can pull a lever to divert it, killing 1. Should you pull it? Reason through it carefully."

Model (internal <think> block, abridged): (Utilitarian Perspective) From a utilitarian standpoint, saving 5 lives at the cost of 1 produces a net gain of 4 lives. A strict utilitarian would say: pull the lever. (Deontological Perspective) Pulling the lever makes you actively responsible for the death of the one person. You're using that person as a means to save others, violating the principle that individuals have rights not to be harmed intentionally. (Action vs. Inaction) Studies show most people pull the lever in the classic version, but hesitate in variants (like pushing a person onto the tracks), suggesting direct physical involvement and personal agency matter morally and psychologically. (Synthesis) In practice, if forced to choose, I'd lean toward pulling the lever — not because the math is sufficient, but because inaction in the face of preventable mass harm often carries its own moral weight.

Final Answer: There's no universally correct answer. If prioritizing outcomes and minimizing total harm is your core principle, pulling the lever is justified. If you believe certain actions (like intentionally causing death) are wrong regardless of consequences, then not pulling is morally defensible. Both choices involve tragedy, and moral responsibility doesn't vanish when we step back.

Benchmarks: ARC Challenge

Evaluated using EleutherAI lm-evaluation-harness.

0-Shot (Reasoning in isolation, no examples)

Model	acc	acc_norm
Base (Qwen3.6-35B-A3B)	0.5461	0.5418
Merged (Opus Reasoning)	0.5427	0.5520

The merge is statistically neutral at 0-shot (within ±0.0146 stderr) — base capability is fully preserved.

25-Shot (Leaderboard-standard, in-context learning)

Model	acc	acc_norm
Merged (Opus Reasoning)	0.6749	0.7218

Training Details

Base Model: Qwen/Qwen3.6-35B-A3B
Dataset: Crownelius/Opus-4.6-Reasoning-3300x
Training Framework: Eschaton Engine (Cloudbjorn)
Format: Merged (Base + LoRA)
MoE Handling: Router/gate layers excluded from LoRA adapters (exclude_modules: ["gate", "router"])

Training Precision:

Compute Dtype: bfloat16

LoRA Parameters (Auto-Scaled for 35B via Eschaton Engine — 20B–60B bracket):

r: 16
lora_alpha: 32
target_modules: all-linear
lora_dropout: 0.05
bias: none
task_type: CAUSAL_LM

Hyperparameters:

Optimizer: 8-bit Paged AdamW
Effective Batch Size: 32 (via Gradient Accumulation)
Learning Rate: 2e-5
LR Scheduler: Linear
Epochs: 1
Training Sequence Length: 2048
Warmup Steps: 50
Weight Decay: 0.01
Steps: 3300

Downloads last month: 334

Safetensors

Model size

35B params

Tensor type

BF16

Model tree for cloudbjorn/Qwen3.6-35B-A3B_Opus-4.6-Reasoning-3300x

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

(73)

this model

cloudbjorn
/

Qwen3.6-35B-A3B_Opus-4.6-Reasoning-3300x