GGUF
English
qwen3_5
qwen3.5
reasoning
chain-of-thought
self-correction
llama.cpp
unsloth
conversational
Instructions to use DJLougen/Harmonic-9B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use DJLougen/Harmonic-9B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="DJLougen/Harmonic-9B-GGUF", filename="Harmonic-9B-BF16-mmproj.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use DJLougen/Harmonic-9B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/DJLougen/Harmonic-9B-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use DJLougen/Harmonic-9B-GGUF with Ollama:
ollama run hf.co/DJLougen/Harmonic-9B-GGUF:Q4_K_M
- Unsloth Studio
How to use DJLougen/Harmonic-9B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DJLougen/Harmonic-9B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DJLougen/Harmonic-9B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for DJLougen/Harmonic-9B-GGUF to start chatting
- Pi
How to use DJLougen/Harmonic-9B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "DJLougen/Harmonic-9B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use DJLougen/Harmonic-9B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default DJLougen/Harmonic-9B-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use DJLougen/Harmonic-9B-GGUF with Docker Model Runner:
docker model run hf.co/DJLougen/Harmonic-9B-GGUF:Q4_K_M
- Lemonade
How to use DJLougen/Harmonic-9B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull DJLougen/Harmonic-9B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Harmonic-9B-GGUF-Q4_K_M
List all available models
lemonade list
Match GGUF model card to main Harmonic-9B card with images and full content
Browse files- .gitattributes +1 -0
- README.md +126 -14
- competitor_comparison.png +0 -0
- pipeline.png +0 -0
- reasoning_flow.png +0 -0
- training_quality.png +3 -0
.gitattributes
CHANGED
|
@@ -41,3 +41,4 @@ Harmonic-9B-F16.gguf filter=lfs diff=lfs merge=lfs -text
|
|
| 41 |
Harmonic-9B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 42 |
Harmonic-9B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
Harmonic-9B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 41 |
Harmonic-9B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 42 |
Harmonic-9B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
Harmonic-9B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
training_quality.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -10,6 +10,7 @@ tags:
|
|
| 10 |
- self-correction
|
| 11 |
- llama.cpp
|
| 12 |
- unsloth
|
|
|
|
| 13 |
base_model: DJLougen/Harmonic-9B
|
| 14 |
---
|
| 15 |
|
|
@@ -23,7 +24,11 @@ base_model: DJLougen/Harmonic-9B
|
|
| 23 |
|
| 24 |
GGUF quantizations of [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) for local inference with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
## Available Quantizations
|
| 29 |
|
|
@@ -44,12 +49,90 @@ Harmonic-9B is a reasoning-focused fine-tune of Qwen 3.5 9B trained on structura
|
|
| 44 |
| `Harmonic-9B-IQ4_XS.gguf` | IQ4_XS | 4.3 | ~4.9 GB | Smallest 4-bit, importance matrix |
|
| 45 |
| `Harmonic-9B-Q3_K_M.gguf` | Q3_K_M | 3.9 | ~4.6 GB | Smallest footprint, some quality loss |
|
| 46 |
|
| 47 |
-
## Recommended Quant
|
| 48 |
|
| 49 |
**Q5_K_M** for most users - fits in 8GB VRAM with room for context, minimal quality degradation on reasoning tasks.
|
| 50 |
|
| 51 |
**Q8_0** if you have the VRAM - preserves the full reasoning depth that the model was trained for.
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
## Usage
|
| 54 |
|
| 55 |
### Ollama
|
|
@@ -68,28 +151,57 @@ ollama run DJLougen/Harmonic-9B-GGUF
|
|
| 68 |
|
| 69 |
Download any quantization and load in LM Studio. The model follows standard ChatML formatting.
|
| 70 |
|
| 71 |
-
##
|
| 72 |
|
| 73 |
-
|
| 74 |
|
| 75 |
-
|
| 76 |
-
|
|
|
|
| 77 |
|
| 78 |
-
|
|
|
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
|
|
|
| 83 |
|
| 84 |
-
|
|
|
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
## License
|
| 89 |
|
| 90 |
-
Apache 2.0 -
|
| 91 |
|
| 92 |
## Links
|
| 93 |
|
| 94 |
-
- Full model: [DJLougen/Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B)
|
| 95 |
-
-
|
|
|
|
|
|
|
|
|
| 10 |
- self-correction
|
| 11 |
- llama.cpp
|
| 12 |
- unsloth
|
| 13 |
+
- conversational
|
| 14 |
base_model: DJLougen/Harmonic-9B
|
| 15 |
---
|
| 16 |
|
|
|
|
| 24 |
|
| 25 |
GGUF quantizations of [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) for local inference with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.
|
| 26 |
|
| 27 |
+
A reasoning-focused fine-tune of [Qwen 3.5 9B](https://huggingface.co/Qwen/Qwen3.5-9B) trained on structurally validated data where every row passes automated quality gates. No junk, no filler, no shallow traces.
|
| 28 |
+
|
| 29 |
+
The name comes from harmonic analysis of reasoning patterns - the structural signal that separates genuine thinking from surface-level chain-of-thought.
|
| 30 |
+
|
| 31 |
+
For the agentic tool-calling variant, see [Harmonic-Hermes-9B](https://huggingface.co/DJLougen/Harmonic-Hermes-9B) (coming soon) - a Stage 2 fine-tune of this model on quality-filtered agent traces from [DJLougen/hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered).
|
| 32 |
|
| 33 |
## Available Quantizations
|
| 34 |
|
|
|
|
| 49 |
| `Harmonic-9B-IQ4_XS.gguf` | IQ4_XS | 4.3 | ~4.9 GB | Smallest 4-bit, importance matrix |
|
| 50 |
| `Harmonic-9B-Q3_K_M.gguf` | Q3_K_M | 3.9 | ~4.6 GB | Smallest footprint, some quality loss |
|
| 51 |
|
| 52 |
+
### Recommended Quant
|
| 53 |
|
| 54 |
**Q5_K_M** for most users - fits in 8GB VRAM with room for context, minimal quality degradation on reasoning tasks.
|
| 55 |
|
| 56 |
**Q8_0** if you have the VRAM - preserves the full reasoning depth that the model was trained for.
|
| 57 |
|
| 58 |
+
### Vision (Multimodal)
|
| 59 |
+
|
| 60 |
+
This model includes `Harmonic-9B-BF16-mmproj.gguf` - the vision projector for multimodal inference. Use with llama.cpp's `--mmproj` flag for image understanding tasks.
|
| 61 |
+
|
| 62 |
+
## Training Approach
|
| 63 |
+
|
| 64 |
+

|
| 65 |
+
|
| 66 |
+
**1,817 curated rows.** That's it. Following the [LIMO hypothesis](https://huggingface.co/papers/2502.03387) (Less Is More for Reasoning), Harmonic uses a small, precisely curated dataset instead of tens of thousands of unfiltered examples. The base model already has the knowledge from pretraining - the fine-tune teaches it a reasoning behavior pattern.
|
| 67 |
+
|
| 68 |
+
Every training row contains explicit self-correction ("wait, that's not right"), verification ("let me check by plugging back in"), and multi-path exploration ("alternatively, I could try..."). The data was generated from multiple frontier models and filtered through a custom structural quality pipeline that enforces reasoning depth, coherence, and flow patterns. 100% of rows pass all quality gates simultaneously.
|
| 69 |
+
|
| 70 |
+
A small set of everyday conversation data is mixed in to preserve the base model's conversational ability - calibrated by token ratio analysis to prevent the reasoning data from drowning out conversational patterns during training.
|
| 71 |
+
|
| 72 |
+
## Training Data Quality
|
| 73 |
+
|
| 74 |
+

|
| 75 |
+
|
| 76 |
+
The reasoning data was curated using a custom structural process supervision pipeline. Key metrics:
|
| 77 |
+
|
| 78 |
+
| Metric | Value |
|
| 79 |
+
|---|---|
|
| 80 |
+
| Signal quality score | 78.7 mean (61.5 min, 90.0 max) |
|
| 81 |
+
| Thinking trace depth | 1,667 words average |
|
| 82 |
+
| Self-correction | 100% of rows (17.2 per row avg) |
|
| 83 |
+
| Verification | 100% of rows (10.3 per row avg) |
|
| 84 |
+
| Exploration | 100% of rows (6.3 per row avg) |
|
| 85 |
+
| Quality gate pass rate | 100% |
|
| 86 |
+
|
| 87 |
+
Every row was scored across multiple structural dimensions and only rows passing all thresholds simultaneously were included. No rows were manually curated - the pipeline is fully automated and reproducible.
|
| 88 |
+
|
| 89 |
+
## How It Compares
|
| 90 |
+
|
| 91 |
+

|
| 92 |
+
|
| 93 |
+
We ran our structural quality analysis against every major public reasoning dataset used for Opus/Qwen distillation. The results:
|
| 94 |
+
|
| 95 |
+
| Dataset | Rows | Think Words | Self-Correction | Verification | Exploration | Signal Score | Gate Pass |
|
| 96 |
+
|---|---|---|---|---|---|---|---|
|
| 97 |
+
| **Harmonic (ours)** | **1,817** | **1,667** | **100%** | **100%** | **100%** | **78.7** | **100%** |
|
| 98 |
+
| Crownelius/Opus-3300x | 2,160 | 188 | 5.9% | 22.6% | 5.2% | 28.0 | 0.1% |
|
| 99 |
+
| nohurry/Opus-Filtered | 2,326 | 191 | 6.7% | 24.1% | 5.3% | 28.5 | 0.1% |
|
| 100 |
+
| TeichAI/Opus-250x | 250 | 323 | 17.2% | 26.8% | 6.8% | 24.6 | 0.4% |
|
| 101 |
+
| Jackrong/Qwen-700x | 633 | 6,653 | 97.5% | 97.6% | 69.8% | 75.6 | 22.7% |
|
| 102 |
+
| Bespoke-Stratos-17k | 16,710 | 1,322 | 88.2% | 72.7% | 59.7% | 71.7 | 49.0% |
|
| 103 |
+
| glaiveai/reasoning-20m | 22M+ | 799 | 64.1% | 41.4% | 37.3% | 46.2 | 12.8% |
|
| 104 |
+
| KingNish/reasoning-20k | 19,944 | 132 | 0.7% | 4.2% | 4.3% | 27.4 | 0.0% |
|
| 105 |
+
|
| 106 |
+
The popular Opus distillation datasets (Crownelius, nohurry, TeichAI) have less than 1% quality gate pass rate. Their thinking traces average under 200 words with near-zero self-correction. Models trained on this data learn to produce short, shallow chain-of-thought that looks like reasoning but lacks the structural behaviors that make reasoning reliable.
|
| 107 |
+
|
| 108 |
+
Jackrong and Stratos are closer competitors but still fall short on consistency. Jackrong has massive traces (6,653 words avg) but only 22.7% pass the quality gate - the thinking is verbose but wanders. Stratos has decent markers but 49% of rows still fail, meaning half the gradient updates during training push the model toward shallow patterns.
|
| 109 |
+
|
| 110 |
+
Harmonic's data is smaller by design. Every row passes. Every gradient update reinforces genuine reasoning behavior.
|
| 111 |
+
|
| 112 |
+
## Reasoning Flow
|
| 113 |
+
|
| 114 |
+

|
| 115 |
+
|
| 116 |
+
Marker density measured across 20 equal segments of each thinking trace. The characteristic curve shows reasoning intensity building through the middle of the trace and peaking in the later segments as the model enters verification and self-correction before committing to an answer.
|
| 117 |
+
|
| 118 |
+
## Training Configuration
|
| 119 |
+
|
| 120 |
+
```
|
| 121 |
+
base_model: Qwen/Qwen3.5-9B
|
| 122 |
+
dataset: 1,459 reasoning + 358 conversation rows
|
| 123 |
+
epochs: 1
|
| 124 |
+
learning_rate: 1e-4
|
| 125 |
+
lr_scheduler: cosine
|
| 126 |
+
warmup_ratio: 0.1
|
| 127 |
+
max_seq_length: 8192
|
| 128 |
+
lora_rank: 32
|
| 129 |
+
lora_alpha: 32
|
| 130 |
+
dropout: 0.05
|
| 131 |
+
micro_batch_size: 1
|
| 132 |
+
gradient_accumulation_steps: 4
|
| 133 |
+
weight_decay: 0.01
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
## Usage
|
| 137 |
|
| 138 |
### Ollama
|
|
|
|
| 151 |
|
| 152 |
Download any quantization and load in LM Studio. The model follows standard ChatML formatting.
|
| 153 |
|
| 154 |
+
### Reasoning format
|
| 155 |
|
| 156 |
+
The model uses `<think>` blocks for reasoning:
|
| 157 |
|
| 158 |
+
```
|
| 159 |
+
<think>
|
| 160 |
+
The user is asking about X. Let me consider two approaches...
|
| 161 |
|
| 162 |
+
Approach 1: ...
|
| 163 |
+
Approach 2: ...
|
| 164 |
|
| 165 |
+
I'll go with Approach 1 because...
|
| 166 |
|
| 167 |
+
Wait, I need to be careful here - this assumes Y, which may not hold.
|
| 168 |
+
Let me verify by checking a special case...
|
| 169 |
|
| 170 |
+
Yes, that confirms the result.
|
| 171 |
+
</think>
|
| 172 |
|
| 173 |
+
[Final answer here]
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
## Intended Use
|
| 177 |
+
|
| 178 |
+
- Reasoning tasks requiring genuine multi-step thinking
|
| 179 |
+
- Mathematical problem-solving with self-correction
|
| 180 |
+
- Code analysis and generation with structured verification
|
| 181 |
+
- General conversation (conversational ability preserved through training design)
|
| 182 |
+
- Base model for Stage 2 agentic fine-tuning
|
| 183 |
+
|
| 184 |
+
## Limitations
|
| 185 |
+
|
| 186 |
+
- 9B parameter model - not suitable for tasks requiring extensive world knowledge
|
| 187 |
+
- Reasoning traces can be verbose for simple questions
|
| 188 |
+
- Not optimized for tool calling - see Harmonic-Hermes-9B (coming soon) for agentic use
|
| 189 |
+
- Benchmark evaluation is ongoing
|
| 190 |
+
|
| 191 |
+
## Architecture
|
| 192 |
+
|
| 193 |
+
- **Base**: Qwen 3.5 9B (9.65B parameters)
|
| 194 |
+
- **Training**: LoRA fine-tuning, merged into base weights
|
| 195 |
+
- **Precision**: BF16
|
| 196 |
+
- **Context**: 8192 tokens
|
| 197 |
|
| 198 |
## License
|
| 199 |
|
| 200 |
+
Apache 2.0 - same as the base model. All training data is from Apache 2.0 or MIT licensed sources. Fully commercial use permitted.
|
| 201 |
|
| 202 |
## Links
|
| 203 |
|
| 204 |
+
- Full model weights: [DJLougen/Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B)
|
| 205 |
+
- Agentic variant: Harmonic-Hermes-9B (coming soon)
|
| 206 |
+
- Filtered agent dataset: [DJLougen/hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered)
|
| 207 |
+
- LIMO paper: [Less is More for Reasoning](https://huggingface.co/papers/2502.03387)
|
competitor_comparison.png
ADDED
|
pipeline.png
ADDED
|
reasoning_flow.png
ADDED
|
training_quality.png
ADDED
|
Git LFS Details
|