add model card - flagship Harmonic reasoning model

83a7ddc verified 18 days ago

6.19 kB

language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - reasoning
  - qwen3.5
  - conversational
  - unsloth
  - self-correction
  - chain-of-thought
  - speculative-decoding
base_model: unsloth/Qwen3.5-27B
pipeline_tag: text-generation

Harmonic-27B

The flagship of the Harmonic family. A reasoning-focused fine-tune of Qwen 3.5 27B trained on structurally validated data where every row passes automated quality gates. No junk, no filler, no shallow traces.

Scales the same proven training approach from Harmonic-9B to 27B parameters. Pairs with Harmonic-2B as a draft model for speculative decoding.

The Harmonic Family

Model	Parameters	Role
Harmonic-2B	2.3B	Draft model for speculative decoding
Harmonic-9B	9.65B	Mid-range reasoning backbone
Harmonic-Hermes-9B	9.65B	Stage 2 agentic variant (tool calling)
Harmonic-27B	27B	Flagship reasoning model

All models share the same training data and reasoning format, enabling speculative decoding across the family with high acceptance rates.

Training Approach

Same pipeline as Harmonic-9B. 1,817 curated rows following the LIMO hypothesis - a small, precisely curated dataset instead of tens of thousands of unfiltered examples. The base model already has the knowledge from pretraining - the fine-tune teaches it a reasoning behavior pattern.

Every training row contains explicit self-correction ("wait, that's not right"), verification ("let me check by plugging back in"), and multi-path exploration ("alternatively, I could try..."). The data was generated from multiple frontier models and filtered through a custom structural quality pipeline that enforces reasoning depth, coherence, and flow patterns. 100% of rows pass all quality gates simultaneously.

A small set of everyday conversation data is mixed in to preserve the base model's conversational ability.

Training Data Quality

Curated using a custom structural process supervision pipeline:

Metric	Value
Signal quality score	78.7 mean (61.5 min, 90.0 max)
Thinking trace depth	1,667 words average
Self-correction	100% of rows (17.2 per row avg)
Verification	100% of rows (10.3 per row avg)
Exploration	100% of rows (6.3 per row avg)
Quality gate pass rate	100%

How It Compares

The same structural quality analysis run against every major public reasoning dataset:

Dataset	Rows	Think Words	Self-Correction	Verification	Exploration	Signal Score	Gate Pass
Harmonic (ours)	1,817	1,667	100%	100%	100%	78.7	100%
Crownelius/Opus-3300x	2,160	188	5.9%	22.6%	5.2%	28.0	0.1%
nohurry/Opus-Filtered	2,326	191	6.7%	24.1%	5.3%	28.5	0.1%
TeichAI/Opus-250x	250	323	17.2%	26.8%	6.8%	24.6	0.4%
Jackrong/Qwen-700x	633	6,653	97.5%	97.6%	69.8%	75.6	22.7%
Bespoke-Stratos-17k	16,710	1,322	88.2%	72.7%	59.7%	71.7	49.0%
glaiveai/reasoning-20m	22M+	799	64.1%	41.4%	37.3%	46.2	12.8%

Training Configuration

base_model: unsloth/Qwen3.5-27B
dataset: 1,459 reasoning + 358 conversation rows
epochs: 1
learning_rate: 1e-4
lr_scheduler: cosine
warmup_ratio: 0.1
max_seq_length: 8192
lora_rank: 32
lora_alpha: 32
dropout: 0.05
micro_batch_size: 1
gradient_accumulation_steps: 4
weight_decay: 0.01

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-27B")
tokenizer = AutoTokenizer.from_pretrained("DJLougen/Harmonic-27B")

With speculative decoding (Harmonic-2B as draft)

from transformers import AutoModelForCausalLM

target = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-27B")
draft = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-2B")

outputs = target.generate(
    **inputs,
    assistant_model=draft,
    max_new_tokens=512,
)

Reasoning format

The model uses think blocks for reasoning:

<|thinking|>
The user is asking about X. Let me consider two approaches...

Approach 1: ...
Approach 2: ...

I will go with Approach 1 because...

Wait, I need to be careful here - this assumes Y, which may not hold.
Let me verify by checking a special case...

Yes, that confirms the result.
<|/thinking|>

[Final answer here]

Intended Use

Complex reasoning tasks requiring deep multi-step thinking
Mathematical problem-solving with self-correction and verification
Code analysis, generation, and debugging with structured reasoning
General conversation (conversational ability preserved through training design)
Base model for Stage 2 agentic fine-tuning (Harmonic-Hermes-27B)
Target model for speculative decoding with Harmonic-2B

Limitations

27B parameters - requires significant compute (single A100 80GB or equivalent)
Reasoning traces can be verbose for simple questions
Not optimized for tool calling - agentic Stage 2 variant planned
Benchmark evaluation is ongoing

Architecture

Base: Qwen 3.5 27B
Training: LoRA fine-tuning, merged into base weights
Precision: BF16
Context: 8192 tokens

License

Apache 2.0 - same as the base model. All training data is from Apache 2.0 or MIT licensed sources. Fully commercial use permitted.

DJLougen
/

Harmonic-27B