caiovicentino1 commited on
Commit
6a259a4
Β·
verified Β·
1 Parent(s): 493ee3e

HLWQ rebrand: title, tags, notice, self-links

Browse files
Files changed (1) hide show
  1. README.md +18 -19
README.md CHANGED
@@ -2,7 +2,6 @@
2
  license: apache-2.0
3
  tags:
4
  - hlwq
5
- - polarquant
6
  - gemma4
7
  - claude-opus
8
  - distill
@@ -15,15 +14,15 @@ arxiv: '2603.29078'
15
  ---
16
 
17
  > [!IMPORTANT]
18
- > **Naming notice (2026-04-10).** The "PolarQuant" technique used in this model is being rebranded to **HLWQ (Hadamard-Lloyd Weight Quantization)**. The change is only the name; the algorithm and the weights in this repository are unchanged.
19
  >
20
- > The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant ([Han et al., arXiv:2502.02617, 2025](https://arxiv.org/abs/2502.02617)). HLWQ addresses **weight** quantization with a **deterministic Walsh-Hadamard rotation** and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses **KV cache** quantization with a **random polar rotation**. The two methods are technically distinct.
21
  >
22
  > Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.
23
  >
24
  > Reference paper for this technique: [arXiv:2603.29078](https://arxiv.org/abs/2603.29078) (v2 in preparation; v1 still uses the old name).
25
 
26
- # 🧊 Gemma-4-31B-Claude-Opus-PolarQuant-Q5-Vision
27
 
28
  **Claude Opus distilled Gemma 4 31B + Vision** on consumer GPUs.
29
 
@@ -31,9 +30,9 @@ Download: **21.8 GB** (vs 62.5 GB BF16 β€” 2.9x compression)
31
 
32
  | Component | Method | Result |
33
  |---|---|---|
34
- | **Text weights** | PolarQuant Q5 + torchao INT4 | 21.8 GB |
35
  | **Vision encoder** | BF16 (full quality) | included |
36
- | **KV Cache** | PolarQuant Q3 (5.3x) | longer context |
37
  | **Reasoning** | Claude Opus 4.6 distilled | high-effort |
38
 
39
  ## 🎯 Key Results
@@ -75,9 +74,9 @@ polarquant chat TeichAI/gemma-4-31B-it-Claude-Opus-Distill --vision
75
  | Method | Bits | Compression | Max Context (4GB) |
76
  |---|---|---|---|
77
  | FP16 | 16 | 1.0x | 4K |
78
- | PolarQuant Q4 | 4 | 4.0x | 17K |
79
- | **PolarQuant Q3** | **3** | **5.3x** | **22K** |
80
- | PolarQuant Q2 | 2 | 8.0x | 35K |
81
 
82
  ## πŸ”§ Technical Details
83
 
@@ -92,7 +91,7 @@ polarquant chat TeichAI/gemma-4-31B-it-Claude-Opus-Distill --vision
92
 
93
  ```bibtex
94
  @article{polarquant2025,
95
- title={PolarQuant: Hadamard-Rotated Lloyd-Max Quantization for LLM Compression},
96
  author={Vicentino, Caio},
97
  journal={arXiv preprint arXiv:2603.29078},
98
  year={2025}
@@ -113,41 +112,41 @@ pip install git+https://github.com/caiovicentino/polarengine-vllm.git
113
 
114
  ### Load & Generate (1 line!)
115
  ```python
116
- from polarengine_vllm import PolarQuantModel
117
 
118
- model = PolarQuantModel.from_pretrained("caiovicentino1/Gemma-4-31B-Claude-Opus-PolarQuant-Q5-Vision")
119
  print(model.generate("Hello, how are you?", max_new_tokens=100))
120
  ```
121
 
122
  ### With KV Cache Compression (5.3x more context)
123
  ```python
124
- model = PolarQuantModel.from_pretrained("caiovicentino1/Gemma-4-31B-Claude-Opus-PolarQuant-Q5-Vision", kv_cache_nbits=3)
125
  # KV cache now uses 5.3x less memory β€” fit longer conversations!
126
  print(model.generate("Explain quantum computing in detail.", max_new_tokens=500))
127
  ```
128
 
129
  ### Benchmark
130
  ```bash
131
- polarquant bench caiovicentino1/Gemma-4-31B-Claude-Opus-PolarQuant-Q5-Vision --ppl --chart
132
  ```
133
 
134
  ### Gradio Demo
135
  ```bash
136
- polarquant demo caiovicentino1/Gemma-4-31B-Claude-Opus-PolarQuant-Q5-Vision --share
137
  ```
138
 
139
- ## πŸ“¦ Method: PolarQuant
140
 
141
  **Hadamard Rotation + Lloyd-Max Optimal Centroids**
142
 
143
- Unlike GGUF (uniform quantization), PolarQuant places quantization levels where weight density is highest β€” mathematically proven optimal for Gaussian-distributed neural network weights.
144
 
145
  ```
146
- PolarQuant Q5 (cos_sim > 0.996) > GGUF Q5_K_M (~0.99) at same size
147
  ```
148
 
149
  ## πŸ”— Links
150
 
151
  - πŸ“„ [Paper β€” arXiv:2603.29078](https://arxiv.org/abs/2603.29078)
152
- - πŸ’» [GitHub β€” PolarEngine](https://github.com/caiovicentino/polarengine-vllm)
153
  - πŸ“¦ [PyPI β€” `pip install polarquant`](https://pypi.org/project/polarquant/)
 
2
  license: apache-2.0
3
  tags:
4
  - hlwq
 
5
  - gemma4
6
  - claude-opus
7
  - distill
 
14
  ---
15
 
16
  > [!IMPORTANT]
17
+ > **Naming notice (2026-04-10).** The "HLWQ" technique used in this model is being rebranded to **HLWQ (Hadamard-Lloyd Weight Quantization)**. The change is only the name; the algorithm and the weights in this repository are unchanged.
18
  >
19
+ > The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named HLWQ ([Han et al., arXiv:2502.02617, 2025](https://arxiv.org/abs/2502.02617)). HLWQ addresses **weight** quantization with a **deterministic Walsh-Hadamard rotation** and Lloyd-Max scalar codebook; Han et al.'s HLWQ addresses **KV cache** quantization with a **random polar rotation**. The two methods are technically distinct.
20
  >
21
  > Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.
22
  >
23
  > Reference paper for this technique: [arXiv:2603.29078](https://arxiv.org/abs/2603.29078) (v2 in preparation; v1 still uses the old name).
24
 
25
+ # 🧊 Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision
26
 
27
  **Claude Opus distilled Gemma 4 31B + Vision** on consumer GPUs.
28
 
 
30
 
31
  | Component | Method | Result |
32
  |---|---|---|
33
+ | **Text weights** | HLWQ Q5 + torchao INT4 | 21.8 GB |
34
  | **Vision encoder** | BF16 (full quality) | included |
35
+ | **KV Cache** | HLWQ Q3 (5.3x) | longer context |
36
  | **Reasoning** | Claude Opus 4.6 distilled | high-effort |
37
 
38
  ## 🎯 Key Results
 
74
  | Method | Bits | Compression | Max Context (4GB) |
75
  |---|---|---|---|
76
  | FP16 | 16 | 1.0x | 4K |
77
+ | HLWQ Q4 | 4 | 4.0x | 17K |
78
+ | **HLWQ Q3** | **3** | **5.3x** | **22K** |
79
+ | HLWQ Q2 | 2 | 8.0x | 35K |
80
 
81
  ## πŸ”§ Technical Details
82
 
 
91
 
92
  ```bibtex
93
  @article{polarquant2025,
94
+ title={HLWQ: Hadamard-Rotated Lloyd-Max Quantization for LLM Compression},
95
  author={Vicentino, Caio},
96
  journal={arXiv preprint arXiv:2603.29078},
97
  year={2025}
 
112
 
113
  ### Load & Generate (1 line!)
114
  ```python
115
+ from polarengine_vllm import HLWQModel
116
 
117
+ model = HLWQModel.from_pretrained("caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision")
118
  print(model.generate("Hello, how are you?", max_new_tokens=100))
119
  ```
120
 
121
  ### With KV Cache Compression (5.3x more context)
122
  ```python
123
+ model = HLWQModel.from_pretrained("caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision", kv_cache_nbits=3)
124
  # KV cache now uses 5.3x less memory β€” fit longer conversations!
125
  print(model.generate("Explain quantum computing in detail.", max_new_tokens=500))
126
  ```
127
 
128
  ### Benchmark
129
  ```bash
130
+ polarquant bench caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision --ppl --chart
131
  ```
132
 
133
  ### Gradio Demo
134
  ```bash
135
+ polarquant demo caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision --share
136
  ```
137
 
138
+ ## πŸ“¦ Method: HLWQ
139
 
140
  **Hadamard Rotation + Lloyd-Max Optimal Centroids**
141
 
142
+ Unlike GGUF (uniform quantization), HLWQ places quantization levels where weight density is highest β€” mathematically proven optimal for Gaussian-distributed neural network weights.
143
 
144
  ```
145
+ HLWQ Q5 (cos_sim > 0.996) > GGUF Q5_K_M (~0.99) at same size
146
  ```
147
 
148
  ## πŸ”— Links
149
 
150
  - πŸ“„ [Paper β€” arXiv:2603.29078](https://arxiv.org/abs/2603.29078)
151
+ - πŸ’» [GitHub β€” HLWQ-Engine](https://github.com/caiovicentino/polarengine-vllm)
152
  - πŸ“¦ [PyPI β€” `pip install polarquant`](https://pypi.org/project/polarquant/)