Instructions to use OsaurusAI/MiniMax-M2.7-JANGTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/MiniMax-M2.7-JANGTQ with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("OsaurusAI/MiniMax-M2.7-JANGTQ") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use OsaurusAI/MiniMax-M2.7-JANGTQ with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OsaurusAI/MiniMax-M2.7-JANGTQ" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OsaurusAI/MiniMax-M2.7-JANGTQ with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OsaurusAI/MiniMax-M2.7-JANGTQ
Run Hermes
hermes
- MLX LM
How to use OsaurusAI/MiniMax-M2.7-JANGTQ with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "OsaurusAI/MiniMax-M2.7-JANGTQ"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OsaurusAI/MiniMax-M2.7-JANGTQ", "messages": [ {"role": "user", "content": "Hello"} ] }'
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default OsaurusAI/MiniMax-M2.7-JANGTQRun Hermes
hermes⚠️ REQUIRED —
jangtq_runtime.safetensorssidecar must be downloadedOsaurus uses the native Swift JANGTQ runtime. Every JANGTQ bundle on OsaurusAI ships a small
jangtq_runtime.safetensorssidecar (10 KB–165 KB) alongside the weight shards. The Swift loader will refuse to start with the errorError: Model '<name>' declares JANGTQ (weight_format: "mxtq") but is missing required sidecar file 'jangtq_runtime.safetensors'. Re-download the full model or obtain the sidecar from the original publisher.if the file is absent.
If your local copy doesn't have it (older download, partial sync, etc):
hf download OsaurusAI/MiniMax-M2.7-JANGTQ jangtq_runtime.safetensors --local-dir <your-dir>The file holds the deterministic codebooks + Hadamard rotation signs the Swift loader uses to decode
*.tq_packedweights. It must match the seed the bundle was quantized with (mxtq_seed=42).
MiniMax M2.7 — JANGTQ (MLX)
TurboQuant codebook quantization of MiniMax's 228B agentic MoE — routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / shared-expert / lm_head at 8-bit affine.
Model Details
| Property | Value |
|---|---|
| Base Model | MiniMaxAI/MiniMax-M2.7 |
| Architecture | MoE (256 experts, top-8 active) + standard Q/K/V attention + partial RoPE |
| Total Parameters | 228.7 B |
| Active per Token | ~1.4 B |
| Profile | JANGTQ |
| Format | JANGTQ (codebook + Hadamard) — weight_format: mxtq in jang_config.json |
| Avg bits/param | ~2.15 |
| Disk | ~57 GB |
| Context length | 192 K tokens |
| Chat template | Always-reasoning (<think> opened at assistant start) |
What is JANGTQ?
JANGTQ (JANG TurboQuant) is a codebook-based quantization format for MoE
models on Apple Silicon. Routed expert weights stay in a compact codebook +
Hadamard-rotated form at runtime — no decompression to affine — and the
matmul path uses custom Metal kernels that read packed uint32 weights, look
up centroids in a small codebook, and accumulate dot products against a
Hadamard-rotated input (QuIP# rotate-input-once math).
Result vs uniform 2-bit affine: smaller on disk, higher quality, runs at ~89 % of affine 2-bit speed.
Bit Allocation
| Component | Bits | Format |
|---|---|---|
| Routed expert MLP (gate / up / down) | 2 | JANGTQ codebook + Hadamard |
| Attention (Q / K / V / O) | 8 | Affine (nn.QuantizedLinear, group_size=64) |
| Shared expert | 8 | Affine |
| Embed tokens / LM head | 8 | Affine |
| Router gate | fp16 | Unquantized nn.Linear |
| RMSNorms / RoPE / biases | fp16 | Unquantized |
The routed experts are 98 % of parameters and the natural compression target. Everything else stays at 8-bit affine so the quality-critical hot path runs at full precision.
Important Settings
MiniMax M2.7 is an always-reasoning model. The chat template
unconditionally opens <think> at each assistant turn.
| Setting | Value | Notes |
|---|---|---|
| Temperature | 1.0 | Required — temp=0 can cause thinking loops |
| Top-P | 0.95 | |
| Top-K | 40 | |
| Repetition Penalty | 1.1 | Optional, helps prevent loops |
max_tokens |
≥ 8192 | Give reasoning room to converge |
Strip <think>…</think> from the response before using the final answer.
Usage
This model requires the jang-tools loader — stock mlx_lm.load() does not
recognize weight_format: mxtq. The loader applies Metal kernel
monkey-patches at load time (fused gate+up+SwiGLU, gather TQ, multi-block
Hadamard, router compile, QKV fusion).
pip install jang-tools
from huggingface_hub import snapshot_download
from jang_tools.load_jangtq import load_jangtq_model
from mlx_lm import generate
model_path = snapshot_download("OsaurusAI/MiniMax-M2.7-JANGTQ")
model, tokenizer = load_jangtq_model(model_path)
messages = [{"role": "user", "content": "Explain photosynthesis in five sentences."}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=False
)
out = generate(model, tokenizer, prompt, max_tokens=600,
temperature=1.0, verbose=True)
Swift — Osaurus / MLX Studio
Both clients auto-detect the JANGTQ runtime from jang_config.json and route
through the MiniMaxJANGTQModel class. Just load the repo — no extra flags.
What's In This Repo
| File | Role |
|---|---|
model-*.safetensors (61 shards, ~57 GB) |
Weights — 2-bit routed TQ + 8-bit affine |
model.safetensors.index.json |
Shard index |
jangtq_runtime.safetensors |
Codebooks + Hadamard signs sidecar (Swift loader) |
jang_config.json |
JANG metadata + Tier-1 capabilities stamp (reasoning=qwen3, tool=minimax) |
config.json |
HF model config (minimax_m2, weight_format=mxtq, mxtq_bits=2) |
chat_template.jinja, tokenizer.*, vocab.json, merges.txt |
Tokenizer + chat template |
configuration_minimax_m2.py, modeling_minimax_m2.py |
HF custom code (untouched from upstream) |
osaurus-x-banner.png, mlx-studio-logo.png |
Branding assets |
Parser Capabilities (Tier-1 auto-detected by Osaurus / vmlx)
{
"reasoning_parser": "qwen3",
"tool_parser": "minimax",
"think_in_template": true,
"supports_tools": true,
"supports_thinking": true,
"family": "minimax_m2",
"modality": "text",
"cache_type": "kv"
}
<think> and <tool_call> are non-special tokens by design — the
application layer parses them. Osaurus and vmlx CapabilityDetector read
this block verbatim and wire the qwen3 reasoning parser + minimax tool
parser automatically, so streamed responses route reasoning_content and
tool_calls into the OpenAI-compatible SSE fields instead of leaking into
content.
License
MIT — see LICENSE.
Credits
Created by Jinho Jang — eric@jangq.ai
Based on MiniMaxAI's MiniMax M2.7. JANGTQ quantization © JANGQ-AI.
- Downloads last month
- 1,670
Quantized
Model tree for OsaurusAI/MiniMax-M2.7-JANGTQ
Base model
MiniMaxAI/MiniMax-M2.7
Start the MLX server
# Install MLX LM: uv tool install mlx-lm# Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ"