majentik commited on
Commit
011272b
·
verified ·
1 Parent(s): 6ccf928

docs: Tier 2 polish — variant matrix + quant trade-off

Browse files
Files changed (1) hide show
  1. README.md +64 -1
README.md CHANGED
@@ -80,4 +80,67 @@ Use the `mlx_lm.generate` API; `enable_thinking` is a runtime flag
80
  `enable_thinking` defaults to `True`. To disable extended reasoning
81
  (e.g., for latency-sensitive cases), pass `enable_thinking=False`
82
  to the chat template / generate call. No separate "no-think"
83
- variant card exists — this is a runtime flag, not a model variant.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  `enable_thinking` defaults to `True`. To disable extended reasoning
81
  (e.g., for latency-sensitive cases), pass `enable_thinking=False`
82
  to the chat template / generate call. No separate "no-think"
83
+ variant card exists — this is a runtime flag, not a model variant.
84
+
85
+ ## Variants in this family
86
+
87
+ (Showing 56 sibling variants under `majentik/nemotron3-nano-omni-30b-*`. The current variant — `TurboQuant-MLX-5bit-TQ-KV` — is **bolded**.)
88
+
89
+ | Variant | Runtime | Approx size | Use case |
90
+ |---|---|---|---|
91
+ | [mmproj-F16](https://huggingface.co/majentik/nemotron3-nano-omni-30b-mmproj-f16) | llama-mtmd-cli | ~1-2 GB | Multimodal projector (pair with any GGUF) |
92
+ | [RotorQuant](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant) | runtime modifier | n/a | KV-cache root (weight-agnostic) |
93
+ | [RotorQuant-GGUF-IQ4_XS](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-IQ4_XS) | llama.cpp | ~26 GB | Lossy 4-bit, low-RAM CPU/edge |
94
+ | [RotorQuant-GGUF-MXFP4_MOE](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-MXFP4_MOE) | llama.cpp | ~30 GB | MXFP4 MoE quant |
95
+ | [RotorQuant-GGUF-Q2_K](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-Q2_K) | llama.cpp | ~18 GB | Lossy, low-RAM CPU/edge |
96
+ | [RotorQuant-GGUF-Q3_K_M](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-Q3_K_M) | llama.cpp | ~23 GB | Smaller 3-bit, CPU-friendly |
97
+ | [RotorQuant-GGUF-Q4_K_M](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-Q4_K_M) | llama.cpp | ~33 GB | Balanced default |
98
+ | [RotorQuant-GGUF-Q5_K_M](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-Q5_K_M) | llama.cpp | ~40 GB | Higher fidelity, more RAM |
99
+ | [RotorQuant-GGUF-Q8_0](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-Q8_0) | llama.cpp | ~63 GB | Near-lossless reference |
100
+ | [RotorQuant-GGUF-IQ4_XS-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-iq4_xs-rq-kv) | llama.cpp | ~26 GB | IQ4_XS + RotorQuant KV |
101
+ | [RotorQuant-GGUF-MXFP4_MOE-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-mxfp4_moe-rq-kv) | llama.cpp | ~30 GB | MXFP4 MoE + RotorQuant KV |
102
+ | [RotorQuant-GGUF-Q2_K-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-q2_k-rq-kv) | llama.cpp | ~18 GB | Q2_K + RotorQuant KV |
103
+ | [RotorQuant-GGUF-Q3_K_M-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-q3_k_m-rq-kv) | llama.cpp | ~23 GB | Q3_K_M + RotorQuant KV |
104
+ | [RotorQuant-GGUF-Q4_K_M-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-q4_k_m-rq-kv) | llama.cpp | ~33 GB | Q4_K_M + RotorQuant KV |
105
+ | [RotorQuant-GGUF-Q5_K_M-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-q5_k_m-rq-kv) | llama.cpp | ~40 GB | Q5_K_M + RotorQuant KV |
106
+ | [RotorQuant-GGUF-Q8_0-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-gguf-q8_0-rq-kv) | llama.cpp | ~63 GB | Q8_0 + RotorQuant KV |
107
+ | [RotorQuant-MLX-2bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-2bit) | mlx-lm | ~9.6 GB | Apple Silicon, smallest |
108
+ | [RotorQuant-MLX-2bit-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-2bit-rq-kv) | mlx-lm | ~9.6 GB | 2-bit + RotorQuant KV |
109
+ | [RotorQuant-MLX-3bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-3bit) | mlx-lm | ~14 GB | Apple Silicon, small |
110
+ | [RotorQuant-MLX-3bit-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-3bit-rq-kv) | mlx-lm | ~14 GB | 3-bit + RotorQuant KV |
111
+ | [RotorQuant-MLX-4bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-4bit) | mlx-lm | ~19 GB | Apple Silicon balanced |
112
+ | [RotorQuant-MLX-4bit-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-4bit-rq-kv) | mlx-lm | ~19 GB | 4-bit + RotorQuant KV |
113
+ | [RotorQuant-MLX-5bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-5bit) | mlx-lm | ~23 GB | Apple Silicon, higher fidelity |
114
+ | [RotorQuant-MLX-5bit-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-5bit-rq-kv) | mlx-lm | ~23 GB | 5-bit + RotorQuant KV |
115
+ | [RotorQuant-MLX-6bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-6bit) | mlx-lm | ~27 GB | Apple Silicon, near-lossless |
116
+ | [RotorQuant-MLX-6bit-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-6bit-rq-kv) | mlx-lm | ~27 GB | 6-bit + RotorQuant KV |
117
+ | [RotorQuant-MLX-8bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-8bit) | mlx-lm | ~35 GB | Apple Silicon reference |
118
+ | [RotorQuant-MLX-8bit-RQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-8bit-rq-kv) | mlx-lm | ~35 GB | 8-bit + RotorQuant KV |
119
+ | [RotorQuant-MLX-MXFP4](https://huggingface.co/majentik/nemotron3-nano-omni-30b-rotorquant-mlx-mxfp4) | mlx-lm | ~19 GB | Apple Silicon MXFP4 |
120
+ | [TurboQuant](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant) | runtime modifier | n/a | KV-cache root (weight-agnostic) |
121
+ | [TurboQuant-GGUF-IQ4_XS](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-IQ4_XS) | llama.cpp | ~26 GB | Lossy 4-bit, low-RAM CPU/edge |
122
+ | [TurboQuant-GGUF-MXFP4_MOE](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-MXFP4_MOE) | llama.cpp | ~30 GB | MXFP4 MoE quant |
123
+ | [TurboQuant-GGUF-Q2_K](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-Q2_K) | llama.cpp | ~18 GB | Lossy, low-RAM CPU/edge |
124
+ | [TurboQuant-GGUF-Q3_K_M](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-Q3_K_M) | llama.cpp | ~23 GB | Smaller 3-bit, CPU-friendly |
125
+ | [TurboQuant-GGUF-Q4_K_M](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-Q4_K_M) | llama.cpp | ~33 GB | Balanced default |
126
+ | [TurboQuant-GGUF-Q5_K_M](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-Q5_K_M) | llama.cpp | ~40 GB | Higher fidelity, more RAM |
127
+ | [TurboQuant-GGUF-Q8_0](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-Q8_0) | llama.cpp | ~63 GB | Near-lossless reference |
128
+ | [TurboQuant-GGUF-IQ4_XS-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-iq4_xs-tq-kv) | llama.cpp | ~26 GB | IQ4_XS + TurboQuant KV |
129
+ | [TurboQuant-GGUF-MXFP4_MOE-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-mxfp4_moe-tq-kv) | llama.cpp | ~30 GB | MXFP4 MoE + TurboQuant KV |
130
+ | [TurboQuant-GGUF-Q2_K-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-q2_k-tq-kv) | llama.cpp | ~18 GB | Q2_K + TurboQuant KV |
131
+ | [TurboQuant-GGUF-Q3_K_M-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-q3_k_m-tq-kv) | llama.cpp | ~23 GB | Q3_K_M + TurboQuant KV |
132
+ | [TurboQuant-GGUF-Q4_K_M-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-q4_k_m-tq-kv) | llama.cpp | ~33 GB | Q4_K_M + TurboQuant KV |
133
+ | [TurboQuant-GGUF-Q5_K_M-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-q5_k_m-tq-kv) | llama.cpp | ~40 GB | Q5_K_M + TurboQuant KV |
134
+ | [TurboQuant-GGUF-Q8_0-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-gguf-q8_0-tq-kv) | llama.cpp | ~63 GB | Q8_0 + TurboQuant KV |
135
+ | [TurboQuant-MLX-2bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-2bit) | mlx-lm | ~9.6 GB | Apple Silicon, smallest |
136
+ | [TurboQuant-MLX-2bit-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-2bit-tq-kv) | mlx-lm | ~9.6 GB | 2-bit + TurboQuant KV |
137
+ | [TurboQuant-MLX-3bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-3bit) | mlx-lm | ~14 GB | Apple Silicon, small |
138
+ | [TurboQuant-MLX-3bit-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-3bit-tq-kv) | mlx-lm | ~14 GB | 3-bit + TurboQuant KV |
139
+ | [TurboQuant-MLX-4bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-4bit) | mlx-lm | ~19 GB | Apple Silicon balanced |
140
+ | [TurboQuant-MLX-4bit-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-4bit-tq-kv) | mlx-lm | ~19 GB | 4-bit + TurboQuant KV |
141
+ | [TurboQuant-MLX-5bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-5bit) | mlx-lm | ~23 GB | Apple Silicon, higher fidelity |
142
+ | **TurboQuant-MLX-5bit-TQ-KV** | mlx-lm | ~23 GB | 5-bit + TurboQuant KV |
143
+ | [TurboQuant-MLX-6bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-6bit) | mlx-lm | ~27 GB | Apple Silicon, near-lossless |
144
+ | [TurboQuant-MLX-6bit-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-6bit-tq-kv) | mlx-lm | ~27 GB | 6-bit + TurboQuant KV |
145
+ | [TurboQuant-MLX-8bit](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-8bit) | mlx-lm | ~35 GB | Apple Silicon reference |
146
+ | [TurboQuant-MLX-8bit-TQ-KV](https://huggingface.co/majentik/nemotron3-nano-omni-30b-turboquant-mlx-8bit-tq-kv) | mlx-lm | ~35 GB | 8-bit + TurboQuant KV |