New Models
Collection
Quants created recently.. where time is relative • 123 items • Updated
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.351,0.501,0.733,0.462,0.348,0.682,0.573
q8-hi 0.363,0.501,0.777,0.466,0.364,0.695,0.548
q8 0.363,0.505,0.779,0.466,0.362,0.695,0.553
q6-hi 0.354,0.503,0.773,0.465,0.370,0.693,0.558
q6 0.357,0.503,0.769,0.462,0.370,0.695,0.543
q5-hi 0.348,0.493,0.771,0.461,0.350,0.684,0.561
q5 0.354,0.502,0.765,0.462,0.356,0.686,0.552
q4-hi 0.342,0.480,0.756,0.442,0.328,0.680,0.557
q4 0.349,0.487,0.749,0.445,0.356,0.670,0.550
mxfp4 0.339,0.489,0.738,0.433,0.330,0.672,0.553
tvall43/Qwen3.5-0.8B-Text-heretic
mxfp8 0.348,0.502,0.635,0.461,0.338,0.682,0.571
mxfp4 0.333,0.495,0.673,0.432,0.330,0.670,0.552
Old model performance
Qwen3-0.6B
bf16 0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi 0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8 0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi 0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6 0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4 0.286,0.364,0.609,0.404,0.316,0.626,0.531
Quant Perplexity Peak memory
mxfp8 6.611 ± 0.049 7.65 GB
mxfp4 7.455 ± 0.057 6.33 GB
More metrics coming soon.
-G
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3.5-0.8B-mxfp8-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
8-bit