Instructions to use DJLougen/Harmonic-9B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DJLougen/Harmonic-9B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="DJLougen/Harmonic-9B-GGUF",
	filename="Harmonic-9B-BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use DJLougen/Harmonic-9B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/DJLougen/Harmonic-9B-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use DJLougen/Harmonic-9B-GGUF with Ollama:
```
ollama run hf.co/DJLougen/Harmonic-9B-GGUF:Q4_K_M
```

Unsloth Studio

How to use DJLougen/Harmonic-9B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DJLougen/Harmonic-9B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DJLougen/Harmonic-9B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for DJLougen/Harmonic-9B-GGUF to start chatting

How to use DJLougen/Harmonic-9B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "DJLougen/Harmonic-9B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use DJLougen/Harmonic-9B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf DJLougen/Harmonic-9B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default DJLougen/Harmonic-9B-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use DJLougen/Harmonic-9B-GGUF with Docker Model Runner:
```
docker model run hf.co/DJLougen/Harmonic-9B-GGUF:Q4_K_M
```

Lemonade

How to use DJLougen/Harmonic-9B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull DJLougen/Harmonic-9B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Harmonic-9B-GGUF-Q4_K_M

List all available models

lemonade list

DJLougen commited on Apr 4

Commit

0b1aa35

verified ·

1 Parent(s): 00e3e62

Match GGUF model card to main Harmonic-9B card with images and full content

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +126 -14
competitor_comparison.png +0 -0
pipeline.png +0 -0
reasoning_flow.png +0 -0
training_quality.png +3 -0

.gitattributes CHANGED Viewed

@@ -41,3 +41,4 @@ Harmonic-9B-F16.gguf filter=lfs diff=lfs merge=lfs -text
 Harmonic-9B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Harmonic-9B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Harmonic-9B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

 Harmonic-9B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Harmonic-9B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Harmonic-9B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+training_quality.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -10,6 +10,7 @@ tags:
 - self-correction
 - llama.cpp
 - unsloth
 base_model: DJLougen/Harmonic-9B
 ---
@@ -23,7 +24,11 @@ base_model: DJLougen/Harmonic-9B
 GGUF quantizations of [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) for local inference with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.
-Harmonic-9B is a reasoning-focused fine-tune of Qwen 3.5 9B trained on structurally validated data where every row passes automated quality gates. See the [full model card](https://huggingface.co/DJLougen/Harmonic-9B) for training details, data quality analysis, and pipeline documentation.
 ## Available Quantizations
@@ -44,12 +49,90 @@ Harmonic-9B is a reasoning-focused fine-tune of Qwen 3.5 9B trained on structura
 | `Harmonic-9B-IQ4_XS.gguf` | IQ4_XS | 4.3 | ~4.9 GB | Smallest 4-bit, importance matrix |
 | `Harmonic-9B-Q3_K_M.gguf` | Q3_K_M | 3.9 | ~4.6 GB | Smallest footprint, some quality loss |
-## Recommended Quant
 **Q5_K_M** for most users - fits in 8GB VRAM with room for context, minimal quality degradation on reasoning tasks.
 **Q8_0** if you have the VRAM - preserves the full reasoning depth that the model was trained for.
 ## Usage
 ### Ollama
@@ -68,28 +151,57 @@ ollama run DJLougen/Harmonic-9B-GGUF
 Download any quantization and load in LM Studio. The model follows standard ChatML formatting.
-## What Makes This Model Different
-Harmonic-9B was trained with a focus on structural reasoning quality over data volume:
-- Deep reasoning with self-correction, verification, and exploration in every training row (100% quality gate pass rate)
-- 1,817 curated rows following the Less Is More hypothesis - precision over volume
-For agentic tool calling, see Harmonic-Hermes-9B (coming soon).
-## Format
-The model uses `<think>` blocks for reasoning. See the [full model card](https://huggingface.co/DJLougen/Harmonic-9B) for format examples.
-### Vision (Multimodal)
-This model includes `Harmonic-9B-BF16-mmproj.gguf` - the vision projector for multimodal inference. Use with llama.cpp's `--mmproj` flag for image understanding tasks.
 ## License
-Apache 2.0 - fully commercial use permitted.
 ## Links
-- Full model: [DJLougen/Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B)
-- Stage 2 dataset: [DJLougen/hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered)

 - self-correction
 - llama.cpp
 - unsloth
+- conversational
 base_model: DJLougen/Harmonic-9B
 ---
 GGUF quantizations of [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) for local inference with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.
+A reasoning-focused fine-tune of [Qwen 3.5 9B](https://huggingface.co/Qwen/Qwen3.5-9B) trained on structurally validated data where every row passes automated quality gates. No junk, no filler, no shallow traces.
+The name comes from harmonic analysis of reasoning patterns - the structural signal that separates genuine thinking from surface-level chain-of-thought.
+For the agentic tool-calling variant, see [Harmonic-Hermes-9B](https://huggingface.co/DJLougen/Harmonic-Hermes-9B) (coming soon) - a Stage 2 fine-tune of this model on quality-filtered agent traces from [DJLougen/hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered).
 ## Available Quantizations
 | `Harmonic-9B-IQ4_XS.gguf` | IQ4_XS | 4.3 | ~4.9 GB | Smallest 4-bit, importance matrix |
 | `Harmonic-9B-Q3_K_M.gguf` | Q3_K_M | 3.9 | ~4.6 GB | Smallest footprint, some quality loss |
+### Recommended Quant
 **Q5_K_M** for most users - fits in 8GB VRAM with room for context, minimal quality degradation on reasoning tasks.
 **Q8_0** if you have the VRAM - preserves the full reasoning depth that the model was trained for.
+### Vision (Multimodal)
+This model includes `Harmonic-9B-BF16-mmproj.gguf` - the vision projector for multimodal inference. Use with llama.cpp's `--mmproj` flag for image understanding tasks.
+## Training Approach
+![Pipeline](pipeline.png)
+**1,817 curated rows.** That's it. Following the [LIMO hypothesis](https://huggingface.co/papers/2502.03387) (Less Is More for Reasoning), Harmonic uses a small, precisely curated dataset instead of tens of thousands of unfiltered examples. The base model already has the knowledge from pretraining - the fine-tune teaches it a reasoning behavior pattern.
+Every training row contains explicit self-correction ("wait, that's not right"), verification ("let me check by plugging back in"), and multi-path exploration ("alternatively, I could try..."). The data was generated from multiple frontier models and filtered through a custom structural quality pipeline that enforces reasoning depth, coherence, and flow patterns. 100% of rows pass all quality gates simultaneously.
+A small set of everyday conversation data is mixed in to preserve the base model's conversational ability - calibrated by token ratio analysis to prevent the reasoning data from drowning out conversational patterns during training.
+## Training Data Quality
+![Training Quality](training_quality.png)
+The reasoning data was curated using a custom structural process supervision pipeline. Key metrics:
+| Metric | Value |
+|---|---|
+| Signal quality score | 78.7 mean (61.5 min, 90.0 max) |
+| Thinking trace depth | 1,667 words average |
+| Self-correction | 100% of rows (17.2 per row avg) |
+| Verification | 100% of rows (10.3 per row avg) |
+| Exploration | 100% of rows (6.3 per row avg) |
+| Quality gate pass rate | 100% |
+Every row was scored across multiple structural dimensions and only rows passing all thresholds simultaneously were included. No rows were manually curated - the pipeline is fully automated and reproducible.
+## How It Compares
+![Competitor Comparison](competitor_comparison.png)
+We ran our structural quality analysis against every major public reasoning dataset used for Opus/Qwen distillation. The results:
+| Dataset | Rows | Think Words | Self-Correction | Verification | Exploration | Signal Score | Gate Pass |
+|---|---|---|---|---|---|---|---|
+| **Harmonic (ours)** | **1,817** | **1,667** | **100%** | **100%** | **100%** | **78.7** | **100%** |
+| Crownelius/Opus-3300x | 2,160 | 188 | 5.9% | 22.6% | 5.2% | 28.0 | 0.1% |
+| nohurry/Opus-Filtered | 2,326 | 191 | 6.7% | 24.1% | 5.3% | 28.5 | 0.1% |
+| TeichAI/Opus-250x | 250 | 323 | 17.2% | 26.8% | 6.8% | 24.6 | 0.4% |
+| Jackrong/Qwen-700x | 633 | 6,653 | 97.5% | 97.6% | 69.8% | 75.6 | 22.7% |
+| Bespoke-Stratos-17k | 16,710 | 1,322 | 88.2% | 72.7% | 59.7% | 71.7 | 49.0% |
+| glaiveai/reasoning-20m | 22M+ | 799 | 64.1% | 41.4% | 37.3% | 46.2 | 12.8% |
+| KingNish/reasoning-20k | 19,944 | 132 | 0.7% | 4.2% | 4.3% | 27.4 | 0.0% |
+The popular Opus distillation datasets (Crownelius, nohurry, TeichAI) have less than 1% quality gate pass rate. Their thinking traces average under 200 words with near-zero self-correction. Models trained on this data learn to produce short, shallow chain-of-thought that looks like reasoning but lacks the structural behaviors that make reasoning reliable.
+Jackrong and Stratos are closer competitors but still fall short on consistency. Jackrong has massive traces (6,653 words avg) but only 22.7% pass the quality gate - the thinking is verbose but wanders. Stratos has decent markers but 49% of rows still fail, meaning half the gradient updates during training push the model toward shallow patterns.
+Harmonic's data is smaller by design. Every row passes. Every gradient update reinforces genuine reasoning behavior.
+## Reasoning Flow
+![Reasoning Flow](reasoning_flow.png)
+Marker density measured across 20 equal segments of each thinking trace. The characteristic curve shows reasoning intensity building through the middle of the trace and peaking in the later segments as the model enters verification and self-correction before committing to an answer.
+## Training Configuration
+```
+base_model: Qwen/Qwen3.5-9B
+dataset: 1,459 reasoning + 358 conversation rows
+epochs: 1
+learning_rate: 1e-4
+lr_scheduler: cosine
+warmup_ratio: 0.1
+max_seq_length: 8192
+lora_rank: 32
+lora_alpha: 32
+dropout: 0.05
+micro_batch_size: 1
+gradient_accumulation_steps: 4
+weight_decay: 0.01
+```
 ## Usage
 ### Ollama
 Download any quantization and load in LM Studio. The model follows standard ChatML formatting.
+### Reasoning format
+The model uses `<think>` blocks for reasoning:
+```
+<think>
+The user is asking about X. Let me consider two approaches...
+Approach 1: ...
+Approach 2: ...
+I'll go with Approach 1 because...
+Wait, I need to be careful here - this assumes Y, which may not hold.
+Let me verify by checking a special case...
+Yes, that confirms the result.
+</think>
+[Final answer here]
+```
+## Intended Use
+- Reasoning tasks requiring genuine multi-step thinking
+- Mathematical problem-solving with self-correction
+- Code analysis and generation with structured verification
+- General conversation (conversational ability preserved through training design)
+- Base model for Stage 2 agentic fine-tuning
+## Limitations
+- 9B parameter model - not suitable for tasks requiring extensive world knowledge
+- Reasoning traces can be verbose for simple questions
+- Not optimized for tool calling - see Harmonic-Hermes-9B (coming soon) for agentic use
+- Benchmark evaluation is ongoing
+## Architecture
+- **Base**: Qwen 3.5 9B (9.65B parameters)
+- **Training**: LoRA fine-tuning, merged into base weights
+- **Precision**: BF16
+- **Context**: 8192 tokens
 ## License
+Apache 2.0 - same as the base model. All training data is from Apache 2.0 or MIT licensed sources. Fully commercial use permitted.
 ## Links
+- Full model weights: [DJLougen/Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B)
+- Agentic variant: Harmonic-Hermes-9B (coming soon)
+- Filtered agent dataset: [DJLougen/hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered)
+- LIMO paper: [Less is More for Reasoning](https://huggingface.co/papers/2502.03387)

competitor_comparison.png ADDED Viewed

pipeline.png ADDED Viewed

reasoning_flow.png ADDED Viewed

training_quality.png ADDED Viewed

Git LFS Details

SHA256: db16402ef1b0d8482a3e5fa42e9114f7d23ebcfef146ce4f8efd46729523893c
Pointer size: 131 Bytes
Size of remote file: 127 kB