Osaurus-AI commited on
Commit
838d726
·
verified ·
1 Parent(s): 272dc9f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +48 -63
README.md CHANGED
@@ -1,45 +1,57 @@
1
  ---
 
 
 
 
2
  license: other
3
  license_name: modified-mit
4
- tags:
5
- - moe
6
- - mixture-of-experts
7
- - jangtq
8
- - reap
9
- - mlx
10
- - minimax_m2
11
  pipeline_tag: text-generation
12
- library_name: mlx
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- # MiniMax-M2.7-Small-JANGTQ
 
 
16
 
17
- **This is now a ~138B-A10B MoE** (down from MiniMax M2's 230B base)
18
- 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from
19
- MiniMax M2 via our v3 calibration corpus + REAP saliency observer +
20
- JANGTQ2 codebook quantization.
21
 
22
- Runs on Apple Silicon via the JANG toolchain + MLX.
 
 
 
 
 
 
 
 
23
 
24
- ## Pipeline
25
 
26
  ```
27
  MiniMax M2 (base)
28
- ↓ v3 calibration corpus (24% code · 20% agentic · 20% general ·
29
- ↓ 10% academic_mc · 8% science · 8% CN ·
30
- ↓ 5% cyber · 3% systems · 2% longctx)
31
 
32
  REAP saliency observer (62 layers × 256 experts → scoring)
33
- MiniMax-M2.7-Small = 40% expert prune (154 of 256 kept per layer)
34
 
35
  JANGTQ2 quantization
36
- • 2-bit MXTQ on routed-expert weights (Hadamard-rotated codebook)
37
  • 8-bit affine on attention + dense MLP + embed + lm_head
38
  • 16-bit on norms and router weights
39
  ```
40
 
41
- ## Specs
42
-
43
  | | Value |
44
  |---|---|
45
  | Parameters | **~138B total, ~10B active per token** |
@@ -58,7 +70,7 @@ from jang_tools.load_jangtq import load_jangtq_model
58
  from mlx_lm import generate
59
  from mlx_lm.sample_utils import make_sampler
60
 
61
- model, tokenizer = load_jangtq_model("JANGQ-AI/MiniMax-M2.7-Small-JANGTQ")
62
 
63
  messages = [{"role": "user", "content": "Write a Python function that…"}]
64
  prompt = tokenizer.apply_chat_template(
@@ -71,64 +83,37 @@ out = generate(model, tokenizer, prompt=prompt, max_tokens=4096,
71
  sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40))
72
  ```
73
 
74
- ## Calibration corpus (v3 mix)
75
-
76
- REAP saliency was computed over a 31,338-sample stratified English+CJK
77
- mix (~8.6 M tokens), bucketed to approximate typical JANGQ-AI workload:
78
-
79
- | Bucket | Share | Source datasets |
80
- |--------|-------|-----------------|
81
- | Coding (22%) | 7% · 6% · 4% · 3% · 2% | `ise-uiuc/Magicoder-OSS-Instruct-75K` · `nvidia/OpenCodeReasoning` · `m-a-p/CodeFeedback-Filtered-Instruction` · `HuggingFaceH4/CodeAlpaca_20K` · `iamtarun/python_code_instructions_18k_alpaca` |
82
- | Agentic (19%) | 7% · 5% · 3% · 2% · 2% | `NousResearch/hermes-function-calling-v1` · `glaiveai/glaive-function-calling-v2` · `lilacai/glaive-function-calling-v2-sharegpt` · `THUDM/AgentInstruct` (os) · `princeton-nlp/SWE-bench_oracle` |
83
- | General (17%) | 7% · 4% · 3% · 3% | `allenai/tulu-3-sft-mixture` · `open-thoughts/OpenThoughts-114k` · `teknium/OpenHermes-2.5` · `HuggingFaceH4/ultrachat_200k` |
84
- | Academic MC (11%) | 5% · 3% · 1% · 1% · 1% · 0.5% · 0.5% | `cais/mmlu` (all, auxiliary_train) · `TIGER-Lab/MMLU-Pro` · `allenai/ai2_arc` · `allenai/openbookqa` · `allenai/sciq` · `tau/commonsense_qa` · `bigbio/med_qa` |
85
- | Science (10%) | 4% · 3% · 1.5% · 1.5% | `AI-MO/NuminaMath-CoT` · `ccdv/arxiv-summarization` · `qiaojin/PubMedQA` · `camel-ai/physics` |
86
- | Chinese (9%) | 4% · 2.5% · 2.5% | `silk-road/alpaca-data-gpt4-chinese` · `wangrui6/Zhihu-KOL` · `YeungNLP/firefly-train-1.1M` |
87
- | Cybersec (5%) | 3% · 2% | `CyberNative/Code_Vulnerability_Security_DPO` · `Trendyol/cybersecurity-instruction-datasets` |
88
- | Long-context (3%) | 2% · 1% | `emozilla/pg19` · `ccdv/arxiv-summarization` (longer docs) |
89
- | Systems (3%) | 1.5% · 1.5% | `b-mc2/sql-create-context` · `cognitivecomputations/dolphin-coder` |
90
-
91
- Total ≈ 31,338 records ≈ 8.6 M tokens (GPT-4 tokenizer estimate).
92
-
93
  ## Evaluation
94
 
95
  ### HumanEval+ (code generation)
96
 
97
- - **Dataset**: `evalplus/humanevalplus` test split (same 164 prompts as
98
- original HumanEval but with much harder test cases from EvalPlus).
99
  - **Protocol**: greedy pass@1 baseline + pass@5 retry on failures.
100
- - **Sampling for pass@5**: temp=1.0, top_p=0.95, top_k=40 (MiniMax
101
- official recommended sampling); k=5 samples per failed problem, early
102
- stop on first passing sample.
103
- - **Max tokens**: 800 (pass@1), 1200 (pass@5 retry).
104
- - **Grading**: each candidate executed as a subprocess with 20s timeout;
105
- passes only if ALL EvalPlus tests pass.
106
 
107
  | Metric | Score |
108
  |--------|-------|
109
  | **pass@1 (greedy)** | **71.95%** (118/164) |
110
- | **pass@5 (greedy + sampled retry of failures)** | **89.02%** (146/164) |
111
 
112
- 28 of the 46 greedy failures were recovered via sampling (temp=1.0,
113
- top_p=0.95, top_k=40); the remaining 18 are a mix of genuine logic
114
- errors (AssertionError) and prompts where even 1200 tokens ran out
115
- mid-reasoning (no_code_block).
116
-
117
- *Eval harness code*: see `jang_tools.kimi_prune.bench_humaneval` (pass@1) and `jang_tools.kimi_prune.bench_humaneval_passk` (pass@k retry on failures) in the [JANG toolchain](https://github.com/jinho-jang/jang).
118
 
119
  ## Variants
120
 
121
  | Variant | Prune | Size | HF |
122
  |---------|-------|------|-----|
123
- | **MiniMax-M2.7-Small** | 40% | 38 GB | `JANGQ-AI/MiniMax-M2.7-Small-JANGTQ` |
124
- | MiniMax-M2.7-Med | 25% | ~48 GB | `JANGQ-AI/MiniMax-M2.7-Med-JANGTQ` *(pending)* |
125
- | MiniMax-M2.7-Large | 10% | ~57 GB | `JANGQ-AI/MiniMax-M2.7-Large-JANGTQ` *(pending)* |
 
 
126
 
127
  ## Credits
128
 
129
- Base: MiniMax M2.
130
- Methodology: JANG toolchain — REAP saliency + JANGTQ codebook quantization.
131
- Release: JANGQ-AI (eric@jangq.ai).
132
 
133
  ## License
134
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
+ library_name: mlx
6
  license: other
7
  license_name: modified-mit
 
 
 
 
 
 
 
8
  pipeline_tag: text-generation
9
+ base_model: MiniMaxAI/MiniMax-M2
10
+ tags:
11
+ - moe
12
+ - mixture-of-experts
13
+ - minimax_m2
14
+ - quantized
15
+ - apple-silicon
16
+ - mlx
17
+ - turboquant
18
+ - jangtq
19
+ - jangtq2
20
+ - reap
21
  ---
22
 
23
+ <p align="center">
24
+ <a href="https://osaurus.ai"><img src="./osaurus-x-banner.png" alt="Osaurus AI"></a>
25
+ </p>
26
 
27
+ <h3 align="center">MiniMax M2.7 Small &mdash; 138B-A10B &mdash; JANGTQ (MLX)</h3>
28
+ <p align="center"><b>This is now a ~138B-A10B MoE</b> (down from MiniMax M2's 230B base) &mdash; 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization &mdash; routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>
 
 
29
 
30
+ <p align="center">
31
+ <a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>&nbsp;
32
+ <a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a>&nbsp;
33
+ <a href="https://huggingface.co/MiniMaxAI/MiniMax-M2"><img src="https://img.shields.io/badge/Base-MiniMax--M2-orange?logo=huggingface" alt="MiniMax M2"></a>
34
+ </p>
35
+
36
+ ---
37
+
38
+ ## Model Details
39
 
40
+ Runs on Apple Silicon via the JANG toolchain + MLX.
41
 
42
  ```
43
  MiniMax M2 (base)
44
+ ↓ v3 calibration corpus (code · agentic · general · academic · science · CN · cyber · systems · long-context)
 
 
45
 
46
  REAP saliency observer (62 layers × 256 experts → scoring)
47
+ ↓ 40% expert prune (154 of 256 kept per layer)
48
 
49
  JANGTQ2 quantization
50
+ • 2-bit MXTQ on routed-expert weights (Hadamard-rotated Lloyd-Max codebook)
51
  • 8-bit affine on attention + dense MLP + embed + lm_head
52
  • 16-bit on norms and router weights
53
  ```
54
 
 
 
55
  | | Value |
56
  |---|---|
57
  | Parameters | **~138B total, ~10B active per token** |
 
70
  from mlx_lm import generate
71
  from mlx_lm.sample_utils import make_sampler
72
 
73
+ model, tokenizer = load_jangtq_model("OsaurusAI/MiniMax-M2.7-Small-JANGTQ")
74
 
75
  messages = [{"role": "user", "content": "Write a Python function that…"}]
76
  prompt = tokenizer.apply_chat_template(
 
83
  sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40))
84
  ```
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ## Evaluation
87
 
88
  ### HumanEval+ (code generation)
89
 
90
+ - **Dataset**: `evalplus/humanevalplus` test split (164 prompts, harder tests than HumanEval).
 
91
  - **Protocol**: greedy pass@1 baseline + pass@5 retry on failures.
92
+ - **Sampling for pass@5**: temp=1.0, top_p=0.95, top_k=40 (MiniMax official); k=5 samples per failed problem, early stop on first pass.
93
+ - **Grading**: each candidate run with 20s subprocess timeout; must pass ALL EvalPlus tests.
 
 
 
 
94
 
95
  | Metric | Score |
96
  |--------|-------|
97
  | **pass@1 (greedy)** | **71.95%** (118/164) |
98
+ | **pass@5 (greedy + sampled retry)** | **89.02%** (146/164) |
99
 
100
+ 28 of 46 greedy failures recovered via sampling; 18 residual failures are genuine logic errors or prompts where 1200 tokens ran out mid-reasoning.
 
 
 
 
 
101
 
102
  ## Variants
103
 
104
  | Variant | Prune | Size | HF |
105
  |---------|-------|------|-----|
106
+ | **MiniMax-M2.7-Small** | 40% | 38 GB | `OsaurusAI/MiniMax-M2.7-Small-JANGTQ` |
107
+ | MiniMax-M2.7-Med | 25% | ~48 GB | `OsaurusAI/MiniMax-M2.7-Med-JANGTQ` *(pending)* |
108
+ | MiniMax-M2.7-Large | 10% | ~57 GB | `OsaurusAI/MiniMax-M2.7-Large-JANGTQ` *(pending)* |
109
+
110
+ Also released under `JANGQ-AI/MiniMax-M2.7-*-JANGTQ`.
111
 
112
  ## Credits
113
 
114
+ Base model: [MiniMax M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).
115
+ Methodology: [JANG toolchain](https://github.com/jinho-jang/jang) — REAP saliency + JANGTQ codebook quantization.
116
+ Served by: [Osaurus](https://osaurus.ai) — Apple-Silicon-native MLX inference.
117
 
118
  ## License
119