Gemma 4 E4B — Opus Reasoning V2

A reasoning-enhanced fine-tune of google/gemma-4-E4B-it, distilled from Claude Opus 4.6 reasoning traces with supplementary math Chain-of-Thought data.

Model Details

Base Model google/gemma-4-E4B-it (4.5B effective params, 8B with embeddings)
Architecture Dense transformer with Per-Layer Embeddings (PLE), 128K context
Fine-tuning Method LoRA via Unsloth
Precision Merged float16
Training Hardware NVIDIA A100 80GB (RunPod)
Training Framework Unsloth + HuggingFace TRL (SFTTrainer)

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha 32
Dropout 0
Bias None
Target Modules Attention + MLP (language layers only)

Training Configuration

Parameter Value
Epochs 2
Learning Rate 1e-4 (cosine schedule)
Batch Size 8 (2 per device × 4 gradient accumulation)
Warmup Steps 100
Optimizer AdamW 8-bit
Weight Decay 0.01
Max Sequence Length 4096
Response-only Training Yes (user turns masked)
Final Training Loss ~0.54

Training Data

Around 20,000 samples combining reasoning distillation and math Chain-of-Thought data (~40% math content):

Dataset Samples Purpose
nohurry/Opus-4.6-Reasoning-3000x-filtered 2,326 Claude Opus 4.6 reasoning traces
Roman1111111/claude-opus-4.6-10000x 9,633 Claude Opus 4.6 extended reasoning
AI-MO/NuminaMath-CoT 4,000 (sampled) Math Chain-of-Thought solutions
TIGER-Lab/MathInstruct 4,000 (sampled) Math CoT + Program-of-Thought

All assistant responses were formatted with <think>...</think> blocks to teach the model structured reasoning before answering.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "naazimsnh02/gemma-4-e4b-opus-reasoning-v2",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/gemma-4-e4b-opus-reasoning-v2")

messages = [{"role": "user", "content": "A train travels 60 km/h. How long does it take to cover 255 km?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

output = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Limitations & Disclaimers

  • This is a reasoning-focused model, not a benchmark-optimized release. It has not been evaluated on standard benchmarks (MMLU, GSM8K, HumanEval, etc.). Performance on such benchmarks is unknown and may differ from the base model.
  • Reasoning style, not reasoning ability. This fine-tune teaches the model to externalize its reasoning in <think> blocks. It does not guarantee improved accuracy over the base model on any given task.
  • Distillation artifacts. The reasoning traces were generated by Claude Opus 4.6. The model may reproduce stylistic patterns, phrasing, or reasoning structures characteristic of the teacher model.
  • Not safety-tuned beyond base. This fine-tune does not add safety training beyond what exists in the base gemma-4-E4B-it model. Users should apply their own safety measures for production use.
  • English only. Training data is predominantly English. Performance in other languages is untested.
  • Small model limitations. At 4.5B effective parameters, the model has inherent capacity limits. Complex multi-step reasoning, nuanced analysis, and knowledge-intensive tasks may be unreliable.
  • No guarantees of factual accuracy. Like all language models, this model can hallucinate, produce incorrect calculations, or generate plausible-sounding but wrong answers.

Intended Use

  • Research and experimentation with reasoning distillation techniques
  • Exploring chain-of-thought behavior in smaller models
  • Personal and educational projects requiring a lightweight reasoning model
  • As a starting point for further fine-tuning

Out of Scope

  • Production systems requiring high reliability or factual accuracy
  • Safety-critical applications (medical, legal, financial advice)
  • Use cases requiring multilingual support
  • Tasks requiring knowledge beyond the base model's training cutoff

Acknowledgments

  • Google for the Gemma 4 model family
  • Unsloth for efficient fine-tuning infrastructure
  • nohurry for the curated Opus 4.6 Reasoning dataset
  • Roman1111111 for the Claude Opus 4.6 10K dataset
  • AI-MO for NuminaMath-CoT
  • TIGER-Lab for MathInstruct

License

This model inherits the Gemma license from the base model. Please review and comply with Google's Gemma Terms of Use.

Downloads last month
345
Safetensors
Model size
8B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naazimsnh02/gemma-4-e4b-opus-reasoning-v2

Finetuned
(79)
this model

Datasets used to train naazimsnh02/gemma-4-e4b-opus-reasoning-v2