Coding without MoEs
Collection
some slower than others • 85 items • Updated
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
qx86-hi 0.572,0.767,0.846,0.716,0.406,0.798,0.679
Qwen3-VL-8B-Instruct-heretic
qx86-hi 0.437,0.583,0.874,0.526,0.412,0.742,0.583
Qwen3-VL-8B-Instruct
qx86-hi 0.455,0.596,0.872,0.543,0.424,0.736,0.593
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-VL-8B-GLM-4.7-Flash-Heretic-Uncensored-Thinking-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
Base model
Qwen/Qwen3-VL-8B-Instruct