Gemma 4 E4B — Opus Reasoning V2

A reasoning-enhanced fine-tune of google/gemma-4-E4B-it, distilled from Claude Opus 4.6 reasoning traces with supplementary math Chain-of-Thought data.

Model Details


Base Model	`google/gemma-4-E4B-it` (4.5B effective params, 8B with embeddings)
Architecture	Dense transformer with Per-Layer Embeddings (PLE), 128K context
Fine-tuning Method	LoRA via Unsloth
Precision	Merged float16
Training Hardware	NVIDIA A100 80GB (RunPod)
Training Framework	Unsloth + HuggingFace TRL (SFTTrainer)

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha	32
Dropout	0
Bias	None
Target Modules	Attention + MLP (language layers only)

Training Configuration

Parameter	Value
Epochs	2
Learning Rate	1e-4 (cosine schedule)
Batch Size	8 (2 per device × 4 gradient accumulation)
Warmup Steps	100
Optimizer	AdamW 8-bit
Weight Decay	0.01
Max Sequence Length	4096
Response-only Training	Yes (user turns masked)
Final Training Loss	~0.54

Training Data

Around 20,000 samples combining reasoning distillation and math Chain-of-Thought data (~40% math content):

Dataset	Samples	Purpose
nohurry/Opus-4.6-Reasoning-3000x-filtered	2,326	Claude Opus 4.6 reasoning traces
Roman1111111/claude-opus-4.6-10000x	9,633	Claude Opus 4.6 extended reasoning
AI-MO/NuminaMath-CoT	4,000 (sampled)	Math Chain-of-Thought solutions
TIGER-Lab/MathInstruct	4,000 (sampled)	Math CoT + Program-of-Thought

All assistant responses were formatted with <think>...</think> blocks to teach the model structured reasoning before answering.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "naazimsnh02/gemma-4-e4b-opus-reasoning-v2",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/gemma-4-e4b-opus-reasoning-v2")

messages = [{"role": "user", "content": "A train travels 60 km/h. How long does it take to cover 255 km?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

output = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Limitations & Disclaimers

This is a reasoning-focused model, not a benchmark-optimized release. It has not been evaluated on standard benchmarks (MMLU, GSM8K, HumanEval, etc.). Performance on such benchmarks is unknown and may differ from the base model.
Reasoning style, not reasoning ability. This fine-tune teaches the model to externalize its reasoning in <think> blocks. It does not guarantee improved accuracy over the base model on any given task.
Distillation artifacts. The reasoning traces were generated by Claude Opus 4.6. The model may reproduce stylistic patterns, phrasing, or reasoning structures characteristic of the teacher model.
Not safety-tuned beyond base. This fine-tune does not add safety training beyond what exists in the base gemma-4-E4B-it model. Users should apply their own safety measures for production use.
English only. Training data is predominantly English. Performance in other languages is untested.
Small model limitations. At 4.5B effective parameters, the model has inherent capacity limits. Complex multi-step reasoning, nuanced analysis, and knowledge-intensive tasks may be unreliable.
No guarantees of factual accuracy. Like all language models, this model can hallucinate, produce incorrect calculations, or generate plausible-sounding but wrong answers.

Intended Use

Research and experimentation with reasoning distillation techniques
Exploring chain-of-thought behavior in smaller models
Personal and educational projects requiring a lightweight reasoning model
As a starting point for further fine-tuning

Out of Scope

Production systems requiring high reliability or factual accuracy
Safety-critical applications (medical, legal, financial advice)
Use cases requiring multilingual support
Tasks requiring knowledge beyond the base model's training cutoff

Acknowledgments

Google for the Gemma 4 model family
Unsloth for efficient fine-tuning infrastructure
nohurry for the curated Opus 4.6 Reasoning dataset
Roman1111111 for the Claude Opus 4.6 10K dataset
AI-MO for NuminaMath-CoT
TIGER-Lab for MathInstruct

License

This model inherits the Gemma license from the base model. Please review and comply with Google's Gemma Terms of Use.

Downloads last month: 345

Safetensors

Model size

8B params

Tensor type

F32

BF16

Model tree for naazimsnh02/gemma-4-e4b-opus-reasoning-v2

Base model

google/gemma-4-E4B-it

Finetuned

(79)

this model

naazimsnh02
/

gemma-4-e4b-opus-reasoning-v2