Qwen3.5-35B-A3B Chimere Distilled -- BF16 Full Weights

Full-precision BF16 weights of the Chimere distillation (Claude Opus 4.6 into Qwen3.5-35B-A3B).

This is the merged result of base Qwen3.5-35B-A3B + Chimere LoRA adapter. Use this for re-quantization or as a starting point for further fine-tuning. For direct inference, use the GGUF versions instead.

When to use this repo

Goal Use this? Alternative
Re-quantize to GGUF (Q4_K_M, IQ3_S, etc.) Yes --
Fine-tune further (SFT, DPO, etc.) Yes Or start from LoRA adapter
Model merging (DARE, TIES, etc.) Yes --
Generate imatrix for custom quantization Yes --
Run inference locally No Use v1 GGUF or v3 GGUF
Run inference on a server Maybe Requires ~72 GB VRAM (A100 80 GB, H100, B200)

Usage

Re-quantize to GGUF

# Convert safetensors to GGUF
python3 llama.cpp/convert_hf_to_gguf.py \
    ./Qwen3.5-35B-A3B-Chimere-Distilled-BF16 \
    --outfile chimere-bf16.gguf --outtype bf16

# Quantize (example: Q4_K_M)
llama-quantize chimere-bf16.gguf chimere-Q4_K_M.gguf Q4_K_M

# With imatrix for better quality at low bitrates
llama-quantize --imatrix imatrix.dat chimere-bf16.gguf chimere-IQ3_S.gguf IQ3_S

# RAMP quantization (custom per-tensor overrides, as used for the GGUF releases)
llama-quantize --imatrix imatrix.dat --custom-q chimere-bf16.gguf chimere-ramp.gguf IQ3_S

Load with transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",  # Requires ~72 GB VRAM
)

messages = [{"role": "user", "content": "Write a Python function to parse JSON."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Fine-tune further

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig

model = AutoModelForCausalLM.from_pretrained(
    "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16",
    torch_dtype="bfloat16",
    device_map="auto",
)

lora_config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0,
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
# Continue with your SFT trainer...

Model Details

Property Value
Architecture Qwen3.5-35B-A3B (MoE, 256 experts, ~3.5B active/token)
Precision BF16 (bfloat16)
Total size ~65 GB (14 safetensors shards)
Storage on HF 71.9 GB
Distillation source Claude Opus 4.6 reasoning traces
Training method SFT LoRA r64, completion-only loss
VRAM for inference ~72 GB (A100 80 GB or better)

Files

File Description
model.safetensors-00001-of-00014 through 00014 Model weight shards (BF16)
model.safetensors.index.json Shard index mapping
config.json Model architecture config
tokenizer.json, tokenizer_config.json Tokenizer files
chat_template.jinja Chat template (Qwen3.5 format)
processor_config.json Processor config

Training Details

This model was trained in two versions:

Parameter v1 (code focus) v3 (instruction focus)
Dataset 9,763 samples 10,191 samples
Composition 37% BFCL v3 + 59% Opus traces + 4% gold v1 base + IFEval/OPSDC/instruction additions
Epochs 1 (611 steps, batch 16) 1 (160 steps, batch 64)
GPU NVIDIA B200 same
Cost ~$5 ~$2

The BF16 weights in this repo correspond to the v1 distillation (code + tools focus). For v3 weights, apply the v3 LoRA to the base Qwen3.5-35B-A3B model.

Related

Citation

@misc{chimere-distilled-2026,
  title={Chimere: Claude Opus 4.6 Distillation of Qwen3.5-35B-A3B MoE for Agentic Local Inference},
  author={Kevletesteur},
  year={2026},
  url={https://huggingface.co/Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16}
}
Downloads last month
850
Safetensors
Model size
36B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16

Finetuned
(76)
this model