Qwen3.5-35B-A3B Chimere Distilled -- BF16 Full Weights
Full-precision BF16 weights of the Chimere distillation (Claude Opus 4.6 into Qwen3.5-35B-A3B).
This is the merged result of base Qwen3.5-35B-A3B + Chimere LoRA adapter. Use this for re-quantization or as a starting point for further fine-tuning. For direct inference, use the GGUF versions instead.
When to use this repo
| Goal | Use this? | Alternative |
|---|---|---|
| Re-quantize to GGUF (Q4_K_M, IQ3_S, etc.) | Yes | -- |
| Fine-tune further (SFT, DPO, etc.) | Yes | Or start from LoRA adapter |
| Model merging (DARE, TIES, etc.) | Yes | -- |
| Generate imatrix for custom quantization | Yes | -- |
| Run inference locally | No | Use v1 GGUF or v3 GGUF |
| Run inference on a server | Maybe | Requires ~72 GB VRAM (A100 80 GB, H100, B200) |
Usage
Re-quantize to GGUF
# Convert safetensors to GGUF
python3 llama.cpp/convert_hf_to_gguf.py \
./Qwen3.5-35B-A3B-Chimere-Distilled-BF16 \
--outfile chimere-bf16.gguf --outtype bf16
# Quantize (example: Q4_K_M)
llama-quantize chimere-bf16.gguf chimere-Q4_K_M.gguf Q4_K_M
# With imatrix for better quality at low bitrates
llama-quantize --imatrix imatrix.dat chimere-bf16.gguf chimere-IQ3_S.gguf IQ3_S
# RAMP quantization (custom per-tensor overrides, as used for the GGUF releases)
llama-quantize --imatrix imatrix.dat --custom-q chimere-bf16.gguf chimere-ramp.gguf IQ3_S
Load with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="bfloat16",
device_map="auto", # Requires ~72 GB VRAM
)
messages = [{"role": "user", "content": "Write a Python function to parse JSON."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Fine-tune further
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig
model = AutoModelForCausalLM.from_pretrained(
"Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16",
torch_dtype="bfloat16",
device_map="auto",
)
lora_config = LoraConfig(
r=64,
lora_alpha=64,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_dropout=0,
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
# Continue with your SFT trainer...
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen3.5-35B-A3B (MoE, 256 experts, ~3.5B active/token) |
| Precision | BF16 (bfloat16) |
| Total size | ~65 GB (14 safetensors shards) |
| Storage on HF | 71.9 GB |
| Distillation source | Claude Opus 4.6 reasoning traces |
| Training method | SFT LoRA r64, completion-only loss |
| VRAM for inference | ~72 GB (A100 80 GB or better) |
Files
| File | Description |
|---|---|
model.safetensors-00001-of-00014 through 00014 |
Model weight shards (BF16) |
model.safetensors.index.json |
Shard index mapping |
config.json |
Model architecture config |
tokenizer.json, tokenizer_config.json |
Tokenizer files |
chat_template.jinja |
Chat template (Qwen3.5 format) |
processor_config.json |
Processor config |
Training Details
This model was trained in two versions:
| Parameter | v1 (code focus) | v3 (instruction focus) |
|---|---|---|
| Dataset | 9,763 samples | 10,191 samples |
| Composition | 37% BFCL v3 + 59% Opus traces + 4% gold | v1 base + IFEval/OPSDC/instruction additions |
| Epochs | 1 (611 steps, batch 16) | 1 (160 steps, batch 64) |
| GPU | NVIDIA B200 | same |
| Cost | ~$5 | ~$2 |
The BF16 weights in this repo correspond to the v1 distillation (code + tools focus). For v3 weights, apply the v3 LoRA to the base Qwen3.5-35B-A3B model.
Related
- Chimere v1 GGUF -- v1 RAMP quantized, ready for inference
- Chimere v3 GGUF -- v3 RAMP quantized, ready for inference
- LoRA adapter -- LoRA weights (requires base model)
- Base model: Qwen3.5-35B-A3B
- GitHub: Chimere
- GitHub: Chimere ODO
Citation
@misc{chimere-distilled-2026,
title={Chimere: Claude Opus 4.6 Distillation of Qwen3.5-35B-A3B MoE for Agentic Local Inference},
author={Kevletesteur},
year={2026},
url={https://huggingface.co/Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-BF16}
}
- Downloads last month
- 850