Instructions to use OsaurusAI/MiniMax-M2.7-JANGTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OsaurusAI/MiniMax-M2.7-JANGTQ with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("OsaurusAI/MiniMax-M2.7-JANGTQ")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use OsaurusAI/MiniMax-M2.7-JANGTQ with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "OsaurusAI/MiniMax-M2.7-JANGTQ"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use OsaurusAI/MiniMax-M2.7-JANGTQ with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default OsaurusAI/MiniMax-M2.7-JANGTQ

Run Hermes

hermes

MLX LM

How to use OsaurusAI/MiniMax-M2.7-JANGTQ with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "OsaurusAI/MiniMax-M2.7-JANGTQ"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "OsaurusAI/MiniMax-M2.7-JANGTQ",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

⚠️ REQUIRED — jangtq_runtime.safetensors sidecar must be downloaded

Osaurus uses the native Swift JANGTQ runtime. Every JANGTQ bundle on OsaurusAI ships a small jangtq_runtime.safetensors sidecar (~~10 KB–~~165 KB) alongside the weight shards. The Swift loader will refuse to start with the error
Error: Model '<name>' declares JANGTQ (weight_format: "mxtq") but is
       missing required sidecar file 'jangtq_runtime.safetensors'.
       Re-download the full model or obtain the sidecar from the original
       publisher.
if the file is absent.

If your local copy doesn't have it (older download, partial sync, etc):
hf download OsaurusAI/MiniMax-M2.7-JANGTQ jangtq_runtime.safetensors --local-dir <your-dir>
The file holds the deterministic codebooks + Hadamard rotation signs the Swift loader uses to decode *.tq_packed weights. It must match the seed the bundle was quantized with (mxtq_seed=42).

MiniMax M2.7 — JANGTQ (MLX)

TurboQuant codebook quantization of MiniMax's 228B agentic MoE — routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / shared-expert / lm_head at 8-bit affine.

Model Details

Property	Value
Base Model	MiniMaxAI/MiniMax-M2.7
Architecture	MoE (256 experts, top-8 active) + standard Q/K/V attention + partial RoPE
Total Parameters	228.7 B
Active per Token	~1.4 B
Profile	JANGTQ
Format	JANGTQ (codebook + Hadamard) — `weight_format: mxtq` in `jang_config.json`
Avg bits/param	~2.15
Disk	~57 GB
Context length	192 K tokens
Chat template	Always-reasoning (`<think>` opened at assistant start)

What is JANGTQ?

JANGTQ (JANG TurboQuant) is a codebook-based quantization format for MoE models on Apple Silicon. Routed expert weights stay in a compact codebook + Hadamard-rotated form at runtime — no decompression to affine — and the matmul path uses custom Metal kernels that read packed uint32 weights, look up centroids in a small codebook, and accumulate dot products against a Hadamard-rotated input (QuIP# rotate-input-once math).

Result vs uniform 2-bit affine: smaller on disk, higher quality, runs at ~89 % of affine 2-bit speed.

Bit Allocation

Component	Bits	Format
Routed expert MLP (gate / up / down)	2	JANGTQ codebook + Hadamard
Attention (Q / K / V / O)	8	Affine (`nn.QuantizedLinear`, group_size=64)
Shared expert	8	Affine
Embed tokens / LM head	8	Affine
Router gate	fp16	Unquantized `nn.Linear`
RMSNorms / RoPE / biases	fp16	Unquantized

The routed experts are 98 % of parameters and the natural compression target. Everything else stays at 8-bit affine so the quality-critical hot path runs at full precision.

Important Settings

MiniMax M2.7 is an always-reasoning model. The chat template unconditionally opens <think> at each assistant turn.

Setting	Value	Notes
Temperature	1.0	Required — `temp=0` can cause thinking loops
Top-P	0.95
Top-K	40
Repetition Penalty	1.1	Optional, helps prevent loops
`max_tokens`	≥ 8192	Give reasoning room to converge

Strip <think>…</think> from the response before using the final answer.

Usage

This model requires the jang-tools loader — stock mlx_lm.load() does not recognize weight_format: mxtq. The loader applies Metal kernel monkey-patches at load time (fused gate+up+SwiGLU, gather TQ, multi-block Hadamard, router compile, QKV fusion).

pip install jang-tools

from huggingface_hub import snapshot_download
from jang_tools.load_jangtq import load_jangtq_model
from mlx_lm import generate

model_path = snapshot_download("OsaurusAI/MiniMax-M2.7-JANGTQ")
model, tokenizer = load_jangtq_model(model_path)

messages = [{"role": "user", "content": "Explain photosynthesis in five sentences."}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False
)
out = generate(model, tokenizer, prompt, max_tokens=600,
               temperature=1.0, verbose=True)

Swift — Osaurus / MLX Studio

Both clients auto-detect the JANGTQ runtime from jang_config.json and route through the MiniMaxJANGTQModel class. Just load the repo — no extra flags.

What's In This Repo

File	Role
`model-*.safetensors` (61 shards, ~57 GB)	Weights — 2-bit routed TQ + 8-bit affine
`model.safetensors.index.json`	Shard index
`jangtq_runtime.safetensors`	Codebooks + Hadamard signs sidecar (Swift loader)
`jang_config.json`	JANG metadata + Tier-1 `capabilities` stamp (`reasoning=qwen3`, `tool=minimax`)
`config.json`	HF model config (`minimax_m2`, `weight_format=mxtq`, `mxtq_bits=2`)
`chat_template.jinja`, `tokenizer.*`, `vocab.json`, `merges.txt`	Tokenizer + chat template
`configuration_minimax_m2.py`, `modeling_minimax_m2.py`	HF custom code (untouched from upstream)
`osaurus-x-banner.png`, `mlx-studio-logo.png`	Branding assets

Parser Capabilities (Tier-1 auto-detected by Osaurus / vmlx)

{
  "reasoning_parser": "qwen3",
  "tool_parser": "minimax",
  "think_in_template": true,
  "supports_tools": true,
  "supports_thinking": true,
  "family": "minimax_m2",
  "modality": "text",
  "cache_type": "kv"
}

<think> and <tool_call> are non-special tokens by design — the application layer parses them. Osaurus and vmlx CapabilityDetector read this block verbatim and wire the qwen3 reasoning parser + minimax tool parser automatically, so streamed responses route reasoning_content and tool_calls into the OpenAI-compatible SSE fields instead of leaking into content.

License

MIT — see LICENSE.

Credits

Created by Jinho Jang — eric@jangq.ai

Downloads last month: 1,670

Safetensors

Model size

15B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Model tree for OsaurusAI/MiniMax-M2.7-JANGTQ

Base model

MiniMaxAI/MiniMax-M2.7

Quantized

(105)

this model

OsaurusAI
/

MiniMax-M2.7-JANGTQ

⚠️ REQUIRED — `jangtq_runtime.safetensors` sidecar must be downloaded

MiniMax M2.7 — JANGTQ (MLX)

Model Details

What is JANGTQ?

Bit Allocation

Important Settings

Usage

Swift — Osaurus / MLX Studio

What's In This Repo

Parser Capabilities (Tier-1 auto-detected by Osaurus / vmlx)

License

Credits

Model tree for OsaurusAI/MiniMax-M2.7-JANGTQ

⚠️ REQUIRED — jangtq_runtime.safetensors sidecar must be downloaded

MiniMax M2.7 — JANGTQ (MLX)

Model Details

What is JANGTQ?

Bit Allocation

Important Settings

Usage

Swift — Osaurus / MLX Studio

What's In This Repo

Parser Capabilities (Tier-1 auto-detected by Osaurus / vmlx)

License

Credits

Model tree for OsaurusAI/MiniMax-M2.7-JANGTQ

⚠️ REQUIRED — `jangtq_runtime.safetensors` sidecar must be downloaded