Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +48 -63

README.md CHANGED Viewed

@@ -1,45 +1,57 @@
 ---
 license: other
 license_name: modified-mit
-tags:
-  - moe
-  - mixture-of-experts
-  - jangtq
-  - reap
-  - mlx
-  - minimax_m2
 pipeline_tag: text-generation
-library_name: mlx
 ---
-# MiniMax-M2.7-Small-JANGTQ
-**This is now a ~138B-A10B MoE** (down from MiniMax M2's 230B base) —
-40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from
-MiniMax M2 via our v3 calibration corpus + REAP saliency observer +
-JANGTQ2 codebook quantization.
-Runs on Apple Silicon via the JANG toolchain + MLX.
-## Pipeline
 ```
 MiniMax M2 (base)
-    ↓  v3 calibration corpus  (24% code · 20% agentic · 20% general ·
-    ↓                          10% academic_mc · 8% science · 8% CN ·
-    ↓                          5% cyber · 3% systems · 2% longctx)
     ↓
 REAP saliency observer (62 layers × 256 experts → scoring)
-    ↓  MiniMax-M2.7-Small = 40% expert prune (154 of 256 kept per layer)
     ↓
 JANGTQ2 quantization
-    • 2-bit MXTQ on routed-expert weights (Hadamard-rotated codebook)
     • 8-bit affine on attention + dense MLP + embed + lm_head
     • 16-bit on norms and router weights
 ```
-## Specs
 | | Value |
 |---|---|
 | Parameters | **~138B total, ~10B active per token** |
@@ -58,7 +70,7 @@ from jang_tools.load_jangtq import load_jangtq_model
 from mlx_lm import generate
 from mlx_lm.sample_utils import make_sampler
-model, tokenizer = load_jangtq_model("JANGQ-AI/MiniMax-M2.7-Small-JANGTQ")
 messages = [{"role": "user", "content": "Write a Python function that…"}]
 prompt = tokenizer.apply_chat_template(
@@ -71,64 +83,37 @@ out = generate(model, tokenizer, prompt=prompt, max_tokens=4096,
                sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40))
 ```
-## Calibration corpus (v3 mix)
-REAP saliency was computed over a 31,338-sample stratified English+CJK
-mix (~8.6 M tokens), bucketed to approximate typical JANGQ-AI workload:
-| Bucket | Share | Source datasets |
-|--------|-------|-----------------|
-| Coding (22%) | 7% · 6% · 4% · 3% · 2% | `ise-uiuc/Magicoder-OSS-Instruct-75K` · `nvidia/OpenCodeReasoning` · `m-a-p/CodeFeedback-Filtered-Instruction` · `HuggingFaceH4/CodeAlpaca_20K` · `iamtarun/python_code_instructions_18k_alpaca` |
-| Agentic (19%) | 7% · 5% · 3% · 2% · 2% | `NousResearch/hermes-function-calling-v1` · `glaiveai/glaive-function-calling-v2` · `lilacai/glaive-function-calling-v2-sharegpt` · `THUDM/AgentInstruct` (os) · `princeton-nlp/SWE-bench_oracle` |
-| General (17%) | 7% · 4% · 3% · 3% | `allenai/tulu-3-sft-mixture` · `open-thoughts/OpenThoughts-114k` · `teknium/OpenHermes-2.5` · `HuggingFaceH4/ultrachat_200k` |
-| Academic MC (11%) | 5% · 3% · 1% · 1% · 1% · 0.5% · 0.5% | `cais/mmlu` (all, auxiliary_train) · `TIGER-Lab/MMLU-Pro` · `allenai/ai2_arc` · `allenai/openbookqa` · `allenai/sciq` · `tau/commonsense_qa` · `bigbio/med_qa` |
-| Science (10%) | 4% · 3% · 1.5% · 1.5% | `AI-MO/NuminaMath-CoT` · `ccdv/arxiv-summarization` · `qiaojin/PubMedQA` · `camel-ai/physics` |
-| Chinese (9%) | 4% · 2.5% · 2.5% | `silk-road/alpaca-data-gpt4-chinese` · `wangrui6/Zhihu-KOL` · `YeungNLP/firefly-train-1.1M` |
-| Cybersec (5%) | 3% · 2% | `CyberNative/Code_Vulnerability_Security_DPO` · `Trendyol/cybersecurity-instruction-datasets` |
-| Long-context (3%) | 2% · 1% | `emozilla/pg19` · `ccdv/arxiv-summarization` (longer docs) |
-| Systems (3%) | 1.5% · 1.5% | `b-mc2/sql-create-context` · `cognitivecomputations/dolphin-coder` |
-Total ≈ 31,338 records ≈ 8.6 M tokens (GPT-4 tokenizer estimate).
 ## Evaluation
 ### HumanEval+ (code generation)
-- **Dataset**: `evalplus/humanevalplus` test split (same 164 prompts as
-  original HumanEval but with much harder test cases from EvalPlus).
 - **Protocol**: greedy pass@1 baseline + pass@5 retry on failures.
-- **Sampling for pass@5**: temp=1.0, top_p=0.95, top_k=40 (MiniMax
-  official recommended sampling); k=5 samples per failed problem, early
-  stop on first passing sample.
-- **Max tokens**: 800 (pass@1), 1200 (pass@5 retry).
-- **Grading**: each candidate executed as a subprocess with 20s timeout;
-  passes only if ALL EvalPlus tests pass.
 | Metric | Score |
 |--------|-------|
 | **pass@1 (greedy)** | **71.95%** (118/164) |
-| **pass@5 (greedy + sampled retry of failures)** | **89.02%** (146/164) |
-28 of the 46 greedy failures were recovered via sampling (temp=1.0,
-top_p=0.95, top_k=40); the remaining 18 are a mix of genuine logic
-errors (AssertionError) and prompts where even 1200 tokens ran out
-mid-reasoning (no_code_block).
-*Eval harness code*: see `jang_tools.kimi_prune.bench_humaneval` (pass@1) and `jang_tools.kimi_prune.bench_humaneval_passk` (pass@k retry on failures) in the [JANG toolchain](https://github.com/jinho-jang/jang).
 ## Variants
 | Variant | Prune | Size | HF |
 |---------|-------|------|-----|
-| **MiniMax-M2.7-Small** | 40% | 38 GB | `JANGQ-AI/MiniMax-M2.7-Small-JANGTQ` |
-| MiniMax-M2.7-Med | 25% | ~48 GB | `JANGQ-AI/MiniMax-M2.7-Med-JANGTQ` *(pending)* |
-| MiniMax-M2.7-Large | 10% | ~57 GB | `JANGQ-AI/MiniMax-M2.7-Large-JANGTQ` *(pending)* |
 ## Credits
-Base: MiniMax M2.
-Methodology: JANG toolchain — REAP saliency + JANGTQ codebook quantization.
-Release: JANGQ-AI (eric@jangq.ai).
 ## License

 ---
+language:
+- en
+- zh
+library_name: mlx
 license: other
 license_name: modified-mit
 pipeline_tag: text-generation
+base_model: MiniMaxAI/MiniMax-M2
+tags:
+- moe
+- mixture-of-experts
+- minimax_m2
+- quantized
+- apple-silicon
+- mlx
+- turboquant
+- jangtq
+- jangtq2
+- reap
 ---
+<p align="center">
+  <a href="https://osaurus.ai"><img src="./osaurus-x-banner.png" alt="Osaurus AI"></a>
+</p>
+<h3 align="center">MiniMax M2.7 Small &mdash; 138B-A10B &mdash; JANGTQ (MLX)</h3>
+<p align="center"><b>This is now a ~138B-A10B MoE</b> (down from MiniMax M2's 230B base) &mdash; 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization &mdash; routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>
+<p align="center">
+  <a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>&nbsp;
+  <a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a>&nbsp;
+  <a href="https://huggingface.co/MiniMaxAI/MiniMax-M2"><img src="https://img.shields.io/badge/Base-MiniMax--M2-orange?logo=huggingface" alt="MiniMax M2"></a>
+</p>
+---
+## Model Details
+Runs on Apple Silicon via the JANG toolchain + MLX.
 ```
 MiniMax M2 (base)
+    ↓  v3 calibration corpus  (code · agentic · general · academic · science · CN · cyber · systems · long-context)
     ↓
 REAP saliency observer (62 layers × 256 experts → scoring)
+    ↓  40% expert prune (154 of 256 kept per layer)
     ↓
 JANGTQ2 quantization
+    • 2-bit MXTQ on routed-expert weights (Hadamard-rotated Lloyd-Max codebook)
     • 8-bit affine on attention + dense MLP + embed + lm_head
     • 16-bit on norms and router weights
 ```
 | | Value |
 |---|---|
 | Parameters | **~138B total, ~10B active per token** |
 from mlx_lm import generate
 from mlx_lm.sample_utils import make_sampler
+model, tokenizer = load_jangtq_model("OsaurusAI/MiniMax-M2.7-Small-JANGTQ")
 messages = [{"role": "user", "content": "Write a Python function that…"}]
 prompt = tokenizer.apply_chat_template(
                sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40))
 ```
 ## Evaluation
 ### HumanEval+ (code generation)
+- **Dataset**: `evalplus/humanevalplus` test split (164 prompts, harder tests than HumanEval).
 - **Protocol**: greedy pass@1 baseline + pass@5 retry on failures.
+- **Sampling for pass@5**: temp=1.0, top_p=0.95, top_k=40 (MiniMax official); k=5 samples per failed problem, early stop on first pass.
+- **Grading**: each candidate run with 20s subprocess timeout; must pass ALL EvalPlus tests.
 | Metric | Score |
 |--------|-------|
 | **pass@1 (greedy)** | **71.95%** (118/164) |
+| **pass@5 (greedy + sampled retry)** | **89.02%** (146/164) |
+28 of 46 greedy failures recovered via sampling; 18 residual failures are genuine logic errors or prompts where 1200 tokens ran out mid-reasoning.
 ## Variants
 | Variant | Prune | Size | HF |
 |---------|-------|------|-----|
+| **MiniMax-M2.7-Small** | 40% | 38 GB | `OsaurusAI/MiniMax-M2.7-Small-JANGTQ` |
+| MiniMax-M2.7-Med | 25% | ~48 GB | `OsaurusAI/MiniMax-M2.7-Med-JANGTQ` *(pending)* |
+| MiniMax-M2.7-Large | 10% | ~57 GB | `OsaurusAI/MiniMax-M2.7-Large-JANGTQ` *(pending)* |
+Also released under `JANGQ-AI/MiniMax-M2.7-*-JANGTQ`.
 ## Credits
+Base model: [MiniMax M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).
+Methodology: [JANG toolchain](https://github.com/jinho-jang/jang) — REAP saliency + JANGTQ codebook quantization.
+Served by: [Osaurus](https://osaurus.ai) — Apple-Silicon-native MLX inference.
 ## License