Gemma 4 E4B — Opus Reasoning V2
A reasoning-enhanced fine-tune of google/gemma-4-E4B-it, distilled from Claude Opus 4.6 reasoning traces with supplementary math Chain-of-Thought data.
Model Details
| Base Model | google/gemma-4-E4B-it (4.5B effective params, 8B with embeddings) |
| Architecture | Dense transformer with Per-Layer Embeddings (PLE), 128K context |
| Fine-tuning Method | LoRA via Unsloth |
| Precision | Merged float16 |
| Training Hardware | NVIDIA A100 80GB (RunPod) |
| Training Framework | Unsloth + HuggingFace TRL (SFTTrainer) |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0 |
| Bias | None |
| Target Modules | Attention + MLP (language layers only) |
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 2 |
| Learning Rate | 1e-4 (cosine schedule) |
| Batch Size | 8 (2 per device × 4 gradient accumulation) |
| Warmup Steps | 100 |
| Optimizer | AdamW 8-bit |
| Weight Decay | 0.01 |
| Max Sequence Length | 4096 |
| Response-only Training | Yes (user turns masked) |
| Final Training Loss | ~0.54 |
Training Data
Around 20,000 samples combining reasoning distillation and math Chain-of-Thought data (~40% math content):
| Dataset | Samples | Purpose |
|---|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,326 | Claude Opus 4.6 reasoning traces |
| Roman1111111/claude-opus-4.6-10000x | 9,633 | Claude Opus 4.6 extended reasoning |
| AI-MO/NuminaMath-CoT | 4,000 (sampled) | Math Chain-of-Thought solutions |
| TIGER-Lab/MathInstruct | 4,000 (sampled) | Math CoT + Program-of-Thought |
All assistant responses were formatted with <think>...</think> blocks to teach the model structured reasoning before answering.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"naazimsnh02/gemma-4-e4b-opus-reasoning-v2",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/gemma-4-e4b-opus-reasoning-v2")
messages = [{"role": "user", "content": "A train travels 60 km/h. How long does it take to cover 255 km?"}]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
return_dict=True,
).to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Limitations & Disclaimers
- This is a reasoning-focused model, not a benchmark-optimized release. It has not been evaluated on standard benchmarks (MMLU, GSM8K, HumanEval, etc.). Performance on such benchmarks is unknown and may differ from the base model.
- Reasoning style, not reasoning ability. This fine-tune teaches the model to externalize its reasoning in
<think>blocks. It does not guarantee improved accuracy over the base model on any given task. - Distillation artifacts. The reasoning traces were generated by Claude Opus 4.6. The model may reproduce stylistic patterns, phrasing, or reasoning structures characteristic of the teacher model.
- Not safety-tuned beyond base. This fine-tune does not add safety training beyond what exists in the base
gemma-4-E4B-itmodel. Users should apply their own safety measures for production use. - English only. Training data is predominantly English. Performance in other languages is untested.
- Small model limitations. At 4.5B effective parameters, the model has inherent capacity limits. Complex multi-step reasoning, nuanced analysis, and knowledge-intensive tasks may be unreliable.
- No guarantees of factual accuracy. Like all language models, this model can hallucinate, produce incorrect calculations, or generate plausible-sounding but wrong answers.
Intended Use
- Research and experimentation with reasoning distillation techniques
- Exploring chain-of-thought behavior in smaller models
- Personal and educational projects requiring a lightweight reasoning model
- As a starting point for further fine-tuning
Out of Scope
- Production systems requiring high reliability or factual accuracy
- Safety-critical applications (medical, legal, financial advice)
- Use cases requiring multilingual support
- Tasks requiring knowledge beyond the base model's training cutoff
Acknowledgments
- Google for the Gemma 4 model family
- Unsloth for efficient fine-tuning infrastructure
- nohurry for the curated Opus 4.6 Reasoning dataset
- Roman1111111 for the Claude Opus 4.6 10K dataset
- AI-MO for NuminaMath-CoT
- TIGER-Lab for MathInstruct
License
This model inherits the Gemma license from the base model. Please review and comply with Google's Gemma Terms of Use.
- Downloads last month
- 345
Model tree for naazimsnh02/gemma-4-e4b-opus-reasoning-v2
Base model
google/gemma-4-E4B-it