README.md · OsaurusAI/MiniMax-M2.7-Small-JANGTQ at main

MiniMax-M2.7-Small-JANGTQ / README.md

Osaurus-AI

fix(eval): correct pass@1=81.10/pass@5=90.24 after extractor bug fix

5776428 verified 2 days ago

preview code

raw

history blame contribute delete

5.03 kB

	---
	language:
	- en
	- zh
	library_name: mlx
	license: other
	license_name: modified-mit
	pipeline_tag: text-generation
	base_model: MiniMaxAI/MiniMax-M2
	tags:
	- moe
	- mixture-of-experts
	- minimax_m2
	- quantized
	- apple-silicon
	- mlx
	- turboquant
	- jangtq
	- jangtq2
	- reap
	---

	<p align="center">
	<a href="https://osaurus.ai"><img src="./osaurus-x-banner.png" alt="Osaurus AI"></a>
	</p>

	<h3 align="center">MiniMax M2.7 Small — 138B-A10B — JANGTQ (MLX)</h3>
	<p align="center"><b>This is now a ~138B-A10B MoE — 38 GB on disk</b> (down from MiniMax M2's ~460 GB / 230B base) — 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization — routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>

	<p align="center">
	<a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>
	<a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a>
	<a href="https://huggingface.co/MiniMaxAI/MiniMax-M2"><img src="https://img.shields.io/badge/Base-MiniMax--M2-orange?logo=huggingface" alt="MiniMax M2"></a>
	</p>

	---

	## Model Details

	Runs on Apple Silicon via the JANG toolchain + MLX.

	```
	MiniMax M2 (base)
	↓ v3 calibration corpus (code · agentic · general · academic · science · CN · cyber · systems · long-context)
	↓
	REAP saliency observer (62 layers × 256 experts → scoring)
	↓ 40% expert prune (154 of 256 kept per layer)
	↓
	JANGTQ2 quantization
	• 2-bit MXTQ on routed-expert weights (Hadamard-rotated Lloyd-Max codebook)
	• 8-bit affine on attention + dense MLP + embed + lm_head
	• 16-bit on norms and router weights
	```

	\| \| Value \|
	\|---\|---\|
	\| Parameters \| ~138B total, ~10B active per token \|
	\| Routed experts kept \| 154 of 256 (60%) \|
	\| Top-k active experts \| 8 per token \|
	\| Layers \| 62 \|
	\| Bundle size \| 38 GB \|
	\| Dtype \| bfloat16 activations \|
	\| Attention \| Standard Q/K/V + GQA 6:1, head_dim=128, rope_theta=5M \|
	\| Context \| 196,608 \|

	## Use

	```python
	from jang_tools.load_jangtq import load_jangtq_model
	from mlx_lm import generate
	from mlx_lm.sample_utils import make_sampler

	model, tokenizer = load_jangtq_model("OsaurusAI/MiniMax-M2.7-Small-JANGTQ")

	messages = [{"role": "user", "content": "Write a Python function that…"}]
	prompt = tokenizer.apply_chat_template(
	messages, add_generation_prompt=True, tokenize=False
	)

	# Interleaved-thinking / always-reasoning. Use MiniMax's
	# official sampling: temp=1.0, top_p=0.95, top_k=40
	out = generate(model, tokenizer, prompt=prompt, max_tokens=4096,
	sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40))
	```

	## Evaluation

	### HumanEval+ (code generation)

	- Dataset: `evalplus/humanevalplus` test split (164 prompts, harder tests than HumanEval).
	- Protocol: sampled pass@1 baseline + pass@5 retry on failures.
	- Sampling for both pass@1 and pass@5 retry: temp=1.0, top_p=0.95, top_k=40 (MiniMax official); max_tokens=5000 on pass@1, 1200 on pass@5; k=5 samples per failed problem, early stop on first pass.
	- Grading: each candidate run with 20s subprocess timeout; must pass ALL EvalPlus tests.
	- Extractor: `jang_tools.kimi_prune.bench_humaneval._extract_code` (≥ 2026-04-24). The earlier extractor mis-paired markdown fences when the model emitted token-boundary glitches at the language tag (e.g. `\`\`\`python一致:`, `\`\`\`pythonfr`) and when the chat template prefilled `<think>` at the prompt boundary, costing roughly nine points of pass@1.

	\| Metric \| Score \|
	\|--------\|-------\|
	\| pass@1 (sampled, temp=1.0) \| 81.10% (133/164) \|
	\| pass@5 (sampled, retry of failures) \| 90.24% (148/164) \|

	After the extractor fix, 30 of 46 originally-counted pass@1 failures resolve cleanly: 15 were correct answers eaten by fence-pairing, and another 15 recover under pass@5 sampling. The 16 residuals split into ~8 token-budget starvations (`no_code_block`), ~5 in-code 2-bit token-boundary glitches (`return False言`, `Nonef`, etc.), and ~3 genuine logic errors on EvalPlus hidden tests.

	## Variants

	\| Variant \| Prune \| Size \| HF \|
	\|---------\|-------\|------\|-----\|
	\| MiniMax-M2.7-Small \| 40% \| 38 GB \| `OsaurusAI/MiniMax-M2.7-Small-JANGTQ` \|
	\| MiniMax-M2.7-Med \| 25% \| ~48 GB \| `OsaurusAI/MiniMax-M2.7-Med-JANGTQ` (pending) \|
	\| MiniMax-M2.7-Large \| 10% \| ~57 GB \| `OsaurusAI/MiniMax-M2.7-Large-JANGTQ` (pending) \|

	Also released under `JANGQ-AI/MiniMax-M2.7-*-JANGTQ`.

	## Credits

	Base model: [MiniMax M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).
	Methodology: [JANG toolchain](https://github.com/jinho-jang/jang) — REAP saliency + JANGTQ codebook quantization.
	Served by: [Osaurus](https://osaurus.ai) — Apple-Silicon-native MLX inference.

	## License

	Modified MIT — inherited from MiniMax M2.