Qwen3.5-35B-A3B Chimere Distilled -- BF16 Full Weights

Full-precision BF16 weights of the Chimere distillation (Claude Opus 4.6 into Qwen3.5-35B-A3B).

This is the merged result of base Qwen3.5-35B-A3B + Chimere LoRA adapter. Use this for re-quantization or as a starting point for further fine-tuning. For direct inference, use the GGUF versions instead.

When to use this repo

Goal	Use this?	Alternative
Re-quantize to GGUF (Q4_K_M, IQ3_S, etc.)	Yes	--
Fine-tune further (SFT, DPO, etc.)	Yes	Or start from LoRA adapter
Model merging (DARE, TIES, etc.)	Yes	--
Generate imatrix for custom quantization	Yes	--
Run inference locally	No	Use v1 GGUF or v3 GGUF
Run inference on a server	Maybe	Requires ~72 GB VRAM (A100 80 GB, H100, B200)

Usage

Re-quantize to GGUF

# Convert safetensors to GGUF
python3 llama.cpp/convert_hf_to_gguf.py \
    ./Qwen3.5-35B-A3B-Chimere-Distilled-BF16 \
    --outfile chimere-bf16.gguf --outtype bf16

# Quantize (example: Q4_K_M)
llama-quantize chimere-bf16.gguf chimere-Q4_K_M.gguf Q4_K_M

# With imatrix for better quality at low bitrates
llama-quantize --imatrix imatrix.dat chimere-bf16.gguf chimere-IQ3_S.gguf IQ3_S

# RAMP quantization (custom per-tensor overrides, as used for the GGUF releases)
llama-quantize --imatrix imatrix.dat --custom-q chimere-bf16.gguf chimere-ramp.gguf IQ3_S

Load with transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",  # Requires ~72 GB VRAM
)

messages = [{"role": "user", "content": "Write a Python function to parse JSON."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Fine-tune further

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig

model = AutoModelForCausalLM.from_pretrained(
    "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16",
    torch_dtype="bfloat16",
    device_map="auto",
)

lora_config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0,
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
# Continue with your SFT trainer...

Model Details

Property	Value
Architecture	Qwen3.5-35B-A3B (MoE, 256 experts, ~3.5B active/token)
Precision	BF16 (bfloat16)
Total size	~65 GB (14 safetensors shards)
Storage on HF	71.9 GB
Distillation source	Claude Opus 4.6 reasoning traces
Training method	SFT LoRA r64, completion-only loss
VRAM for inference	~72 GB (A100 80 GB or better)

Files

File	Description
`model.safetensors-00001-of-00014` through `00014`	Model weight shards (BF16)
`model.safetensors.index.json`	Shard index mapping
`config.json`	Model architecture config
`tokenizer.json`, `tokenizer_config.json`	Tokenizer files
`chat_template.jinja`	Chat template (Qwen3.5 format)
`processor_config.json`	Processor config

Training Details

This model was trained in two versions:

Parameter	v1 (code focus)	v3 (instruction focus)
Dataset	9,763 samples	10,191 samples
Composition	37% BFCL v3 + 59% Opus traces + 4% gold	v1 base + IFEval/OPSDC/instruction additions
Epochs	1 (611 steps, batch 16)	1 (160 steps, batch 64)
GPU	NVIDIA B200	same
Cost	~$5	~$2

The BF16 weights in this repo correspond to the v1 distillation (code + tools focus). For v3 weights, apply the v3 LoRA to the base Qwen3.5-35B-A3B model.

Chimere v1 GGUF -- v1 RAMP quantized, ready for inference
Chimere v3 GGUF -- v3 RAMP quantized, ready for inference
LoRA adapter -- LoRA weights (requires base model)
Base model: Qwen3.5-35B-A3B
GitHub: Chimere
GitHub: Chimere ODO

Citation

@misc{chimere-distilled-2026,
  title={Chimere: Claude Opus 4.6 Distillation of Qwen3.5-35B-A3B MoE for Agentic Local Inference},
  author={Kevletesteur},
  year={2026},
  url={https://huggingface.co/Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16}
}

Downloads last month: 850

Safetensors

Model size

36B params

Tensor type

BF16

F32

Model tree for Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B