Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,45 +1,57 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: other
|
| 3 |
license_name: modified-mit
|
| 4 |
-
tags:
|
| 5 |
-
- moe
|
| 6 |
-
- mixture-of-experts
|
| 7 |
-
- jangtq
|
| 8 |
-
- reap
|
| 9 |
-
- mlx
|
| 10 |
-
- minimax_m2
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
-
40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from
|
| 19 |
-
MiniMax M2 via our v3 calibration corpus + REAP saliency observer +
|
| 20 |
-
JANGTQ2 codebook quantization.
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
```
|
| 27 |
MiniMax M2 (base)
|
| 28 |
-
↓ v3 calibration corpus (
|
| 29 |
-
↓ 10% academic_mc · 8% science · 8% CN ·
|
| 30 |
-
↓ 5% cyber · 3% systems · 2% longctx)
|
| 31 |
↓
|
| 32 |
REAP saliency observer (62 layers × 256 experts → scoring)
|
| 33 |
-
↓
|
| 34 |
↓
|
| 35 |
JANGTQ2 quantization
|
| 36 |
-
• 2-bit MXTQ on routed-expert weights (Hadamard-rotated codebook)
|
| 37 |
• 8-bit affine on attention + dense MLP + embed + lm_head
|
| 38 |
• 16-bit on norms and router weights
|
| 39 |
```
|
| 40 |
|
| 41 |
-
## Specs
|
| 42 |
-
|
| 43 |
| | Value |
|
| 44 |
|---|---|
|
| 45 |
| Parameters | **~138B total, ~10B active per token** |
|
|
@@ -58,7 +70,7 @@ from jang_tools.load_jangtq import load_jangtq_model
|
|
| 58 |
from mlx_lm import generate
|
| 59 |
from mlx_lm.sample_utils import make_sampler
|
| 60 |
|
| 61 |
-
model, tokenizer = load_jangtq_model("
|
| 62 |
|
| 63 |
messages = [{"role": "user", "content": "Write a Python function that…"}]
|
| 64 |
prompt = tokenizer.apply_chat_template(
|
|
@@ -71,64 +83,37 @@ out = generate(model, tokenizer, prompt=prompt, max_tokens=4096,
|
|
| 71 |
sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40))
|
| 72 |
```
|
| 73 |
|
| 74 |
-
## Calibration corpus (v3 mix)
|
| 75 |
-
|
| 76 |
-
REAP saliency was computed over a 31,338-sample stratified English+CJK
|
| 77 |
-
mix (~8.6 M tokens), bucketed to approximate typical JANGQ-AI workload:
|
| 78 |
-
|
| 79 |
-
| Bucket | Share | Source datasets |
|
| 80 |
-
|--------|-------|-----------------|
|
| 81 |
-
| Coding (22%) | 7% · 6% · 4% · 3% · 2% | `ise-uiuc/Magicoder-OSS-Instruct-75K` · `nvidia/OpenCodeReasoning` · `m-a-p/CodeFeedback-Filtered-Instruction` · `HuggingFaceH4/CodeAlpaca_20K` · `iamtarun/python_code_instructions_18k_alpaca` |
|
| 82 |
-
| Agentic (19%) | 7% · 5% · 3% · 2% · 2% | `NousResearch/hermes-function-calling-v1` · `glaiveai/glaive-function-calling-v2` · `lilacai/glaive-function-calling-v2-sharegpt` · `THUDM/AgentInstruct` (os) · `princeton-nlp/SWE-bench_oracle` |
|
| 83 |
-
| General (17%) | 7% · 4% · 3% · 3% | `allenai/tulu-3-sft-mixture` · `open-thoughts/OpenThoughts-114k` · `teknium/OpenHermes-2.5` · `HuggingFaceH4/ultrachat_200k` |
|
| 84 |
-
| Academic MC (11%) | 5% · 3% · 1% · 1% · 1% · 0.5% · 0.5% | `cais/mmlu` (all, auxiliary_train) · `TIGER-Lab/MMLU-Pro` · `allenai/ai2_arc` · `allenai/openbookqa` · `allenai/sciq` · `tau/commonsense_qa` · `bigbio/med_qa` |
|
| 85 |
-
| Science (10%) | 4% · 3% · 1.5% · 1.5% | `AI-MO/NuminaMath-CoT` · `ccdv/arxiv-summarization` · `qiaojin/PubMedQA` · `camel-ai/physics` |
|
| 86 |
-
| Chinese (9%) | 4% · 2.5% · 2.5% | `silk-road/alpaca-data-gpt4-chinese` · `wangrui6/Zhihu-KOL` · `YeungNLP/firefly-train-1.1M` |
|
| 87 |
-
| Cybersec (5%) | 3% · 2% | `CyberNative/Code_Vulnerability_Security_DPO` · `Trendyol/cybersecurity-instruction-datasets` |
|
| 88 |
-
| Long-context (3%) | 2% · 1% | `emozilla/pg19` · `ccdv/arxiv-summarization` (longer docs) |
|
| 89 |
-
| Systems (3%) | 1.5% · 1.5% | `b-mc2/sql-create-context` · `cognitivecomputations/dolphin-coder` |
|
| 90 |
-
|
| 91 |
-
Total ≈ 31,338 records ≈ 8.6 M tokens (GPT-4 tokenizer estimate).
|
| 92 |
-
|
| 93 |
## Evaluation
|
| 94 |
|
| 95 |
### HumanEval+ (code generation)
|
| 96 |
|
| 97 |
-
- **Dataset**: `evalplus/humanevalplus` test split (
|
| 98 |
-
original HumanEval but with much harder test cases from EvalPlus).
|
| 99 |
- **Protocol**: greedy pass@1 baseline + pass@5 retry on failures.
|
| 100 |
-
- **Sampling for pass@5**: temp=1.0, top_p=0.95, top_k=40 (MiniMax
|
| 101 |
-
|
| 102 |
-
stop on first passing sample.
|
| 103 |
-
- **Max tokens**: 800 (pass@1), 1200 (pass@5 retry).
|
| 104 |
-
- **Grading**: each candidate executed as a subprocess with 20s timeout;
|
| 105 |
-
passes only if ALL EvalPlus tests pass.
|
| 106 |
|
| 107 |
| Metric | Score |
|
| 108 |
|--------|-------|
|
| 109 |
| **pass@1 (greedy)** | **71.95%** (118/164) |
|
| 110 |
-
| **pass@5 (greedy + sampled retry
|
| 111 |
|
| 112 |
-
28 of
|
| 113 |
-
top_p=0.95, top_k=40); the remaining 18 are a mix of genuine logic
|
| 114 |
-
errors (AssertionError) and prompts where even 1200 tokens ran out
|
| 115 |
-
mid-reasoning (no_code_block).
|
| 116 |
-
|
| 117 |
-
*Eval harness code*: see `jang_tools.kimi_prune.bench_humaneval` (pass@1) and `jang_tools.kimi_prune.bench_humaneval_passk` (pass@k retry on failures) in the [JANG toolchain](https://github.com/jinho-jang/jang).
|
| 118 |
|
| 119 |
## Variants
|
| 120 |
|
| 121 |
| Variant | Prune | Size | HF |
|
| 122 |
|---------|-------|------|-----|
|
| 123 |
-
| **MiniMax-M2.7-Small** | 40% | 38 GB | `
|
| 124 |
-
| MiniMax-M2.7-Med | 25% | ~48 GB | `
|
| 125 |
-
| MiniMax-M2.7-Large | 10% | ~57 GB | `
|
|
|
|
|
|
|
| 126 |
|
| 127 |
## Credits
|
| 128 |
|
| 129 |
-
Base: MiniMax M2.
|
| 130 |
-
Methodology: JANG toolchain — REAP saliency + JANGTQ codebook quantization.
|
| 131 |
-
|
| 132 |
|
| 133 |
## License
|
| 134 |
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- zh
|
| 5 |
+
library_name: mlx
|
| 6 |
license: other
|
| 7 |
license_name: modified-mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
pipeline_tag: text-generation
|
| 9 |
+
base_model: MiniMaxAI/MiniMax-M2
|
| 10 |
+
tags:
|
| 11 |
+
- moe
|
| 12 |
+
- mixture-of-experts
|
| 13 |
+
- minimax_m2
|
| 14 |
+
- quantized
|
| 15 |
+
- apple-silicon
|
| 16 |
+
- mlx
|
| 17 |
+
- turboquant
|
| 18 |
+
- jangtq
|
| 19 |
+
- jangtq2
|
| 20 |
+
- reap
|
| 21 |
---
|
| 22 |
|
| 23 |
+
<p align="center">
|
| 24 |
+
<a href="https://osaurus.ai"><img src="./osaurus-x-banner.png" alt="Osaurus AI"></a>
|
| 25 |
+
</p>
|
| 26 |
|
| 27 |
+
<h3 align="center">MiniMax M2.7 Small — 138B-A10B — JANGTQ (MLX)</h3>
|
| 28 |
+
<p align="center"><b>This is now a ~138B-A10B MoE</b> (down from MiniMax M2's 230B base) — 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization — routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
<p align="center">
|
| 31 |
+
<a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>
|
| 32 |
+
<a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a>
|
| 33 |
+
<a href="https://huggingface.co/MiniMaxAI/MiniMax-M2"><img src="https://img.shields.io/badge/Base-MiniMax--M2-orange?logo=huggingface" alt="MiniMax M2"></a>
|
| 34 |
+
</p>
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
## Model Details
|
| 39 |
|
| 40 |
+
Runs on Apple Silicon via the JANG toolchain + MLX.
|
| 41 |
|
| 42 |
```
|
| 43 |
MiniMax M2 (base)
|
| 44 |
+
↓ v3 calibration corpus (code · agentic · general · academic · science · CN · cyber · systems · long-context)
|
|
|
|
|
|
|
| 45 |
↓
|
| 46 |
REAP saliency observer (62 layers × 256 experts → scoring)
|
| 47 |
+
↓ 40% expert prune (154 of 256 kept per layer)
|
| 48 |
↓
|
| 49 |
JANGTQ2 quantization
|
| 50 |
+
• 2-bit MXTQ on routed-expert weights (Hadamard-rotated Lloyd-Max codebook)
|
| 51 |
• 8-bit affine on attention + dense MLP + embed + lm_head
|
| 52 |
• 16-bit on norms and router weights
|
| 53 |
```
|
| 54 |
|
|
|
|
|
|
|
| 55 |
| | Value |
|
| 56 |
|---|---|
|
| 57 |
| Parameters | **~138B total, ~10B active per token** |
|
|
|
|
| 70 |
from mlx_lm import generate
|
| 71 |
from mlx_lm.sample_utils import make_sampler
|
| 72 |
|
| 73 |
+
model, tokenizer = load_jangtq_model("OsaurusAI/MiniMax-M2.7-Small-JANGTQ")
|
| 74 |
|
| 75 |
messages = [{"role": "user", "content": "Write a Python function that…"}]
|
| 76 |
prompt = tokenizer.apply_chat_template(
|
|
|
|
| 83 |
sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40))
|
| 84 |
```
|
| 85 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
## Evaluation
|
| 87 |
|
| 88 |
### HumanEval+ (code generation)
|
| 89 |
|
| 90 |
+
- **Dataset**: `evalplus/humanevalplus` test split (164 prompts, harder tests than HumanEval).
|
|
|
|
| 91 |
- **Protocol**: greedy pass@1 baseline + pass@5 retry on failures.
|
| 92 |
+
- **Sampling for pass@5**: temp=1.0, top_p=0.95, top_k=40 (MiniMax official); k=5 samples per failed problem, early stop on first pass.
|
| 93 |
+
- **Grading**: each candidate run with 20s subprocess timeout; must pass ALL EvalPlus tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
| Metric | Score |
|
| 96 |
|--------|-------|
|
| 97 |
| **pass@1 (greedy)** | **71.95%** (118/164) |
|
| 98 |
+
| **pass@5 (greedy + sampled retry)** | **89.02%** (146/164) |
|
| 99 |
|
| 100 |
+
28 of 46 greedy failures recovered via sampling; 18 residual failures are genuine logic errors or prompts where 1200 tokens ran out mid-reasoning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
## Variants
|
| 103 |
|
| 104 |
| Variant | Prune | Size | HF |
|
| 105 |
|---------|-------|------|-----|
|
| 106 |
+
| **MiniMax-M2.7-Small** | 40% | 38 GB | `OsaurusAI/MiniMax-M2.7-Small-JANGTQ` |
|
| 107 |
+
| MiniMax-M2.7-Med | 25% | ~48 GB | `OsaurusAI/MiniMax-M2.7-Med-JANGTQ` *(pending)* |
|
| 108 |
+
| MiniMax-M2.7-Large | 10% | ~57 GB | `OsaurusAI/MiniMax-M2.7-Large-JANGTQ` *(pending)* |
|
| 109 |
+
|
| 110 |
+
Also released under `JANGQ-AI/MiniMax-M2.7-*-JANGTQ`.
|
| 111 |
|
| 112 |
## Credits
|
| 113 |
|
| 114 |
+
Base model: [MiniMax M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).
|
| 115 |
+
Methodology: [JANG toolchain](https://github.com/jinho-jang/jang) — REAP saliency + JANGTQ codebook quantization.
|
| 116 |
+
Served by: [Osaurus](https://osaurus.ai) — Apple-Silicon-native MLX inference.
|
| 117 |
|
| 118 |
## License
|
| 119 |
|