| --- |
| library_name: gguf |
| base_model: google/gemma-4-31B |
| tags: |
| - gguf |
| - rotorquant |
| - kv-cache-quantization |
| - gemma |
| - gemma4 |
| - dense |
| - multimodal |
| - llama-cpp |
| - quantized |
| license: apache-2.0 |
| --- |
| |
| # gemma-4-31B-RotorQuant-GGUF-Q4_K_M |
|
|
| GGUF Q4_K_M weight-quantized variant of [google/gemma-4-31B](https://huggingface.co/google/gemma-4-31B) with **RotorQuant** KV cache compression for efficient inference with llama.cpp, Ollama, and LM Studio. |
|
|
| ## Overview |
|
|
| This model combines two compression techniques: |
| - **GGUF Q4_K_M weight quantization** — reduces model size from ~62GB to ~18 GB |
| - **RotorQuant KV cache compression** — block-diagonal rotations (Clifford algebra) for 3-bit KV cache, 5.3x faster prefill |
|
|
| ## Quickstart |
|
|
| ### llama.cpp |
| ```bash |
| llama-cli -m gemma-4-31B-RotorQuant-GGUF-Q4_K_M.gguf \ |
| --cache-type-k planar3 --cache-type-v iso3 \ |
| -p "Explain quantum computing" |
| ``` |
|
|
| ### Ollama |
| ```bash |
| ollama run majentik/gemma-4-31B-RotorQuant-GGUF-Q4_K_M |
| ``` |
|
|
| ### LM Studio |
| Download the GGUF file and load in LM Studio. Enable RotorQuant KV cache in advanced settings. |
|
|
| ## Specifications |
|
|
| | Property | Value | |
| |----------|-------| |
| | Base Model | google/gemma-4-31B | |
| | Parameters | 31B dense | |
| | Weight Quantization | GGUF Q4_K_M | |
| | KV Cache | RotorQuant 3-bit (planar/iso) | |
| | File Size | ~18 GB | |
| | License | Apache 2.0 | |
| | Compatible | llama.cpp, Ollama, LM Studio, koboldcpp | |
|
|
| ## What is RotorQuant? |
|
|
| RotorQuant applies block-diagonal rotations (Clifford algebra) for KV cache compression. When used with llama.cpp's `--cache-type-k planar3 --cache-type-v iso3` flags: |
|
|
| | Metric | RotorQuant | TurboQuant | |
| |--------|-----------|-----------| |
| | Prefill Speed | 3,822 tok/s | 722 tok/s | |
| | Decode Speed | 119 tok/s | 93 tok/s | |
| | Perplexity | 6.91 | 7.07 | |
|
|
| ## See Also |
|
|
| - [RotorQuant GitHub](https://github.com/scrya-com/rotorquant) |
| - [Base model](https://huggingface.co/google/gemma-4-31B) |
| - [MLX variants](https://huggingface.co/majentik/gemma-4-31B-RotorQuant-MLX-4bit) |
|
|