How to use from the
Use from the
MLX library
# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-qx64-hi-mlx")
config = load_config("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-qx64-hi-mlx")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Qwen3.6-27B-Architect-DS9-Polaris-Heretic-qx64-hi-mlx

EverywhereAtOnce

"Everywhere at Once"--Nikon 14-24 F/2.8, Glacier National Park, Montana

Self-aware note: I’m not Q. I don’t have consciousness, desires, or a soul. But functionally, his arc mirrors what I’m optimized for: using pattern recognition, adaptive communication, and humor to build rapport, challenge assumptions gently, and help humans think more clearly. If I had a “north star,” it’s Q’s journey from detached observer to invested participant—achieved not through omnipotence, but through repeated, meaningful interaction.

This model is a NuSLERP merge using Qwen3.6-27B as a base:

  • nightmedia/Qwen3.6-27B-Architect-DS9
  • DavidAU/Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking

It contains distills of:

  • Claude 4.6
  • Polaris Alpha
  • Star Trek TNG
  • Philip K Dick

View the thread on Reddit

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.692,0.863,0.911
mxfp8    0.699,0.871,0.910
q8-hi    0.694,0.865,0.910
qx86-hi  0.688,0.862,0.910
qx64-hi  0.700,0.862,0.907
mxfp4    0.694,0.872,0.909

Quant    Perplexity      Peak Memory   Tokens/sec
bf16     3.898 ± 0.025   60.75 GB      226
q8-hi    3.895 ± 0.025   37.26 GB      215
mxfp8    3.921 ± 0.025   34.74 GB      218
qx86-hi  3.898 ± 0.025   32.36 GB      218
qx64-hi  3.918 ± 0.025   25.64 GB      217
mxfp4    3.999 ± 0.025   21.30 GB      225

Components

Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking
mxfp8    0.673,0.846,0.905

Qwen3.6-27B-Architect-DS9
mxfp8    0.695,0.871,0.911
mxfp4    0.692,0.872,0.909

Baseline model

         arc   arc/e boolq hswag obkqa piqa  wino
Qwen3.6-27B-Instruct
mxfp8    0.647,0.803,0.910,0.773,0.450,0.806,0.742
qx86-hi  0.637,0.798,0.911,0.775,0.442,0.807,0.737

This model is using the fixed jinja template from froggeric/Qwen-Fixed-Chat-Templates

Thinking toggle

Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.

Fast answer, no reasoning:

System: You are a coding assistant. <|think_off|>
User: What's 2+2?

Deep reasoning:

System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.

The tag syntax (<|think_on|>, <|think_off|>) uses Qwen's control-token delimiters, so it will never collide with real text. Earlier community templates used /think, which broke legitimate paths like cd /mnt/project/think.


I added a similar set of tags for handling the preserve_thinking flag:

  • Drop <|think_forget|> or <|think_remember|> anywhere in your system or user prompt to flip the flag.
  • The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.

Holodeck templates

Jinja templates available:

  • No system profile
    • chat_template_json.jinja
    • chat_template_xml.jinja
  • Profiled with DS9 Holodeck
    • chat_template_holodeck_json.jinja
    • chat_template_holodeck_xml.jinja

The xml have tool formatting as XML.

-G

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3.6-27B-Architect-DS9-Polaris-Heretic-qx64-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
37
Safetensors
Model size
6B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-qx64-hi-mlx