--- license: apache-2.0 base_model: - FINAL-Bench/Darwin-9B-Opus tags: - darwin - darwin-v8 - darwin-neg - native-entropy-gating - NEG - reasoning - self-regulated-reasoning - advanced-reasoning - thinking - qwen3.5 - qwen - gpqa - benchmark - open-source - apache-2.0 - hybrid-vigor - proto-agi - vidraft - eval-results language: - en - zh - ko - ja - multilingual pipeline_tag: text-generation library_name: transformers model-index: - name: Darwin-9B-NEG results: - task: type: text-generation name: Graduate-Level Reasoning dataset: type: Idavidrein/gpqa name: GPQA Diamond config: gpqa_diamond split: train metrics: - type: accuracy value: 84.34 name: Accuracy verified: false --- # Darwin-9B-NEG โ The First Native Entropy Gating Model
> Qwen3.5-9B backbone ยท 8.95B parameters ยท BF16 ยท Thinking Mode ยท Apache 2.0 > **The first NEG-enabled model โ self-regulating reasoning with no extra library.** --- ## Abstract **Darwin-9B-NEG** is the first model in the Darwin series to feature **Native Entropy Gating (NEG)** โ a proprietary Darwin architectural innovation that embeds a sense of *self-confidence* directly into the model weights. Unlike external multi-turn iteration (MTI) techniques that require 3รโ8ร extra inference, NEG operates *inside* the single decoding loop and activates in fewer than 5 % of generation steps, lifting reasoning accuracy **by more than 12 percentage points at 1ร inference cost**. On the **GPQA Diamond** PhD-level reasoning benchmark (198 questions), Darwin-9B-NEG scores **84.34 %** with the full 3-stage ensemble protocol โ surpassing even the published Qwen3.5-9B leaderboard result (81.7 %). --- ## What Makes Darwin-9B-NEG Different ### ๐งฌ Darwin Series โ Evolutionary Model Merging The Darwin family is produced by **Darwin V7**, an evolutionary breeding engine that recombines two parent LLMs into a single descendant, preserving hybrid vigour across reasoning and knowledge capabilities. **Darwin-9B-Opus** โ this model's base โ is the Qwen3.5-family member of the Darwin series, previously published as a stand-alone reasoning model. ### โก NEG โ Native Entropy Gating (Darwin V8) **NEG** is a proprietary Darwin technology that gives the language model an architecturally-internalised *self-confidence sense*. Two tiny learnable modules ride alongside the transformer: - **NEG-Head** (โ 4 M params, ~ 0.05 % of total weights) predicts, at each step, the entropy of the next-token distribution from the last hidden state. - **NEG-Gate** (1 learnable threshold) decides, on a per-token basis, whether the model is "confident enough" to commit to its top choice, or whether it should restrict its choice to a narrow top-k subset. Because NEG is carried *inside* the model weights themselves, there is nothing extra to ship or to install: standard `transformers` loading with `trust_remote_code=True` attaches the modules automatically. The model file *is* the feature. **Why it matters** - **1ร inference cost** โ no multi-sample voting, no multi-turn loops - **< 5 % gate activation** โ negligible latency overhead versus the base model - **+12.63 %p on GPQA Diamond** vs. the NEG-free Darwin-9B-Opus baseline (same greedy decoding, same prompt, same tokens) - **Single-file deployment** โ drop in to vLLM / SGLang / TGI / `transformers`, no new engine required - **No trade-secret leaks** โ the merge recipe is kept internal; only the final model weights are released under Apache 2.0 --- ## ๐๏ธ Architecture Overview ``` Input Text โ [Darwin-9B-Opus backbone (frozen during NEG training)] โ Transformer Layers ร 32 โ last hidden state โโโ โ โ โผ โผ LM Head NEG-Head โ โ base logits predicted entropy โ โ โโโโถ NEG-Gate โโโ โ โผ guided logits โ โผ next token ``` ### Key Specifications | Component | Value | |:---|:---| | Architecture | Qwen3.5 decoder-only transformer (32 layers, hidden 4096) | | Total parameters | 8.95 B (base) + โ 4 M (NEG modules) | | NEG-Head | 2-layer MLP with softplus output | | NEG-Gate | top-k masking gate with learnable entropy threshold | | Precision | bfloat16 | | Context length | inherited from Darwin-9B-Opus | | License | Apache 2.0 | --- ## ๐ Benchmark Results โ GPQA Diamond (198 PhD-level questions) Darwin-9B-NEG ships **three decoding modes** from the *same* model weights, allowing users to trade inference cost for accuracy: | Mode | Decoding Protocol | Inference Cost | **Accuracy** | |:---:|:---|:---:|:---:| | **0 ยท Baseline** | Darwin-9B-Opus greedy (NEG disabled) | 1ร | 51.01 % | | **1 ยท Pure NEG** | greedy decoding **with NEG enabled** | **1ร** | **63.64 %** | | **2 ยท Permutation** | NEG + choice-order permutation (4 orderings, majority) | 4ร | 76.26 % | | **3 ยท Ensemble Refinement** | NEG + permutation + temperature-sampled ensemble | โ 20ร | **๐ฅ 84.34 %** | **Improvements:** - Pure NEG (mode 1) vs. baseline: **+12.63 %p at identical inference cost** - Ensemble (mode 3) vs. baseline: **+33.33 %p** - Ensemble vs. Qwen3.5-9B leaderboard score (81.7 %): **+2.64 %p** > **Gate activation rate**: 4.36 % (measured across the 198-question greedy run) โ NEG fires conservatively, only when the model is genuinely uncertain. --- ## ๐ Usage ### Quick start โ Pure NEG greedy (mode 1, sales default) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tok = AutoTokenizer.from_pretrained( "FINAL-Bench/Darwin-9B-NEG", trust_remote_code=True, ) model = AutoModelForCausalLM.from_pretrained( "FINAL-Bench/Darwin-9B-NEG", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) messages = [ {"role": "user", "content": "Solve: If f(x) = xยณ โ 3x + 2, find and classify all critical points."} ] text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tok(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False) print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) ``` ### Using the bundled NEG loader helper `modeling_darwin_neg.py` is shipped inside the repo and provides a convenience loader: ```python from modeling_darwin_neg import load_darwin_neg model = load_darwin_neg( "FINAL-Bench/Darwin-9B-NEG", hf_token="hf_xxx", ) ``` ### Mode selection - **Mode 1 (Pure NEG)**: default `do_sample=False`, NEG is always on. - **Mode 2 (Permutation)**: shuffle the option order 4 times, greedy each, majority-vote. - **Mode 3 (Ensemble)**: production protocol combining permutation, temperature sampling and second-opinion re-query (internal; reproduction scripts are released separately). --- ## ๐งฌ Model Lineage ``` Qwen/Qwen3.5-9B + (Opus-distilled sibling) โฒ โฑ Darwin V7 evolutionary merge โผ Darwin-9B-Opus โโ stand-alone reasoning model (Apache 2.0) โผ NEG-Head / NEG-Gate training (Darwin V8) โผ Darwin-9B-NEG โโ THIS MODEL ``` - **Base**: [FINAL-Bench/Darwin-9B-Opus](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus) (weights frozen during NEG training) - **Technology generation**: Darwin V8 (Native Entropy Gating) โ successor to Darwin V7 (evolutionary merging) --- ## ๐ฏ Recommended Use-Cases - **Graduate-level STEM reasoning** โ physics, chemistry, biology, mathematics (GPQA-style) - **Mathematical problem solving** (MATH, AIME-style) - **Code reasoning and debugging** (HumanEval-style) - **Complex chain-of-thought** tasks where a small reasoning model with a big boost is desired ## โ ๏ธ Limitations - Optimised for English first, with secondary support for Korean / Chinese / Japanese. - At 8.95 B parameters, knowledge coverage is smaller than the larger Darwin models (27B / 31B / 36B) โ for pure world-knowledge tasks consider Darwin-36B-Opus. - The Ensemble mode (84.34 %) uses โ 20ร inference; choose Pure NEG (mode 1) for cost-sensitive deployments. --- ## ๐ Citation ```bibtex @misc{darwin9b_neg_2026, title = {Darwin-9B-NEG: Native Entropy Gating for Self-Regulated Reasoning at 1x Inference Cost}, author = {FINAL-Bench / Darwin Research Team}, year = {2026}, howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-NEG}}, note = {Darwin V8 โ Native Entropy Gating technology generation} } ``` --- ## ๐ Related Darwin Models - **Darwin-36B-Opus** โ MoE 36B, Qwen3.6-35B-A3B ร Opus distilled, GPQA 88.4 % - **Darwin-31B-Opus** โ 31B multilingual-strong reasoning - **Darwin-27B-Opus** โ 27B dense, GPQA 86.9 % - **Darwin-28B-Opus** โ Qwen3.6-27B ร rico03 Opus distilled (new 2026-04) - **Darwin-9B-Opus** โ this model's base, Qwen3.5-9B family - **Darwin-4B-Genesis** โ smallest member, Gemma4 family --- *Darwin V8 ยท Sealed 2026-04-24 ยท FINAL-Bench*