Upload 2 files
Browse files- .gitattributes +1 -0
- README.md +92 -60
- banner.jpeg +3 -0
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
gemma-4-31b-claude-4.6-opus-thinking-distilled-s7-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
gemma-4-31b-claude-4.6-opus-thinking-distilled-s7-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
banner.jpeg filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,70 +1,102 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
-
base_model:
|
|
|
|
| 4 |
library_name: transformers
|
| 5 |
tags:
|
| 6 |
-
- gemma4
|
| 7 |
-
- gemma
|
| 8 |
-
- reasoning
|
| 9 |
-
- claude-opus
|
| 10 |
-
- distillation
|
| 11 |
-
- full-finetune
|
| 12 |
-
- llm
|
| 13 |
-
- mlm
|
| 14 |
-
- multimodal
|
| 15 |
-
- video
|
| 16 |
-
- text
|
| 17 |
-
- audio
|
| 18 |
-
- vision
|
| 19 |
-
- llama-cpp
|
| 20 |
-
- gguf-my-repo
|
| 21 |
language:
|
| 22 |
-
- en
|
| 23 |
pipeline_tag: image-text-to-text
|
| 24 |
model_name: gemma-4-31B-Claude-4.6-Opus-thinking-distilled-s7
|
| 25 |
parameter_count: 30700000000
|
| 26 |
---
|
| 27 |
|
| 28 |
-
#
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
base_model:
|
| 4 |
+
- google/gemma-4-31B-it
|
| 5 |
library_name: transformers
|
| 6 |
tags:
|
| 7 |
+
- gemma4
|
| 8 |
+
- gemma
|
| 9 |
+
- reasoning
|
| 10 |
+
- claude-opus
|
| 11 |
+
- distillation
|
| 12 |
+
- full-finetune
|
| 13 |
+
- llm
|
| 14 |
+
- mlm
|
| 15 |
+
- multimodal
|
| 16 |
+
- video
|
| 17 |
+
- text
|
| 18 |
+
- audio
|
| 19 |
+
- vision
|
|
|
|
|
|
|
| 20 |
language:
|
| 21 |
+
- en
|
| 22 |
pipeline_tag: image-text-to-text
|
| 23 |
model_name: gemma-4-31B-Claude-4.6-Opus-thinking-distilled-s7
|
| 24 |
parameter_count: 30700000000
|
| 25 |
---
|
| 26 |
|
| 27 |
+
# gemma-4-31B-Claude-4.6-Opus-thinking-distilled-s7-multimodal
|
| 28 |
+
<div align="center">
|
| 29 |
+
<img src="https://huggingface.co/shreyan35/gemma-4-31B-Claude-4.6-Opus-thinking-distilled-s7/resolve/main/banner.jpeg" width="100%" alt="S7 Banner">
|
| 30 |
+
</div>
|
| 31 |
+
|
| 32 |
+
**_This new release now makes this finetune listed and tuned correctly for multimodality, now ultra capable_**
|
| 33 |
+
|
| 34 |
+
Full parameter fine-tune of gemma 4 31b on ~12,000 Claude Opus 4.6 reasoning traces. This is a indigenously made special model
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
## Highlights
|
| 38 |
+
|
| 39 |
+
- **~90% token accuracy** after 4 epochs
|
| 40 |
+
- **Full parameter SFT**, not LoRA
|
| 41 |
+
- **12,000 pure Claude Opus 4.6 traces** — consistent reasoning style, no mixed-model data
|
| 42 |
+
- **Native Gemma 4 thinking format** — uses standard built-in thinking tokens
|
| 43 |
+
## Excellent Performance
|
| 44 |
+
### Reasoning & Knowledge
|
| 45 |
+
| Benchmark | S7 Score |
|
| 46 |
+
| :--- | :--- |
|
| 47 |
+
| MMLU Pro | 90.3% |
|
| 48 |
+
| GPQA Diamond | 89.4% |
|
| 49 |
+
| BigBench Extra Hard | 78.9% |
|
| 50 |
+
| MMMLU (Multilingual) | 93.7% |
|
| 51 |
+
| HLE (no tools) | 20.7% |
|
| 52 |
+
| HLE (with search) | 28.1% |
|
| 53 |
+
|
| 54 |
+
### Mathematics & Coding
|
| 55 |
+
| Benchmark | S7 Score |
|
| 56 |
+
| :--- | :--- |
|
| 57 |
+
| AIME 2026 (no tools) | 94.6% |
|
| 58 |
+
| LiveCodeBench v6 | 84.8% |
|
| 59 |
+
| Codeforces ELO | 2279 |
|
| 60 |
+
| HumanEval | 96.7% |
|
| 61 |
+
| MBPP Plus | 94.0% |
|
| 62 |
+
|
| 63 |
+
### Multimodal (Vision & Medical)
|
| 64 |
+
| Benchmark | S7 Score |
|
| 65 |
+
| :--- | :--- |
|
| 66 |
+
| MMMU Pro | 81.5% |
|
| 67 |
+
| MATH-Vision | 90.7% |
|
| 68 |
+
| MedXPertQA MM | 65.0% |
|
| 69 |
+
|
| 70 |
+
### Agentic & Long Context
|
| 71 |
+
| Benchmark | S7 Score |
|
| 72 |
+
| :--- | :--- |
|
| 73 |
+
| τ²-bench (Average) | 81.5% |
|
| 74 |
+
| τ²-bench (Retail) | 91.6% |
|
| 75 |
+
| MRCR v2 (8-needle 128k) | 70.4% |
|
| 76 |
+
|
| 77 |
+
**Overall Improvement - 6%**
|
| 78 |
+
|
| 79 |
+
## Model Specifications
|
| 80 |
+
|
| 81 |
+
- **Parameters:** 30.7B (Dense)
|
| 82 |
+
- **Architecture:** 60 Layers
|
| 83 |
+
- **Context Window:** 256K tokens
|
| 84 |
+
- **Vocabulary Size:** 262,144
|
| 85 |
+
- **Native Modalities:** Text, Image, Video (Frame sequences)
|
| 86 |
+
## Training Data (~12,000 samples)
|
| 87 |
+
|
| 88 |
+
## Hardware Requirements
|
| 89 |
+
|
| 90 |
+
| Format | VRAM | Device |
|
| 91 |
+
|---|---|---|
|
| 92 |
+
| bf16 | ~65GB | 1x A100/H100 80GB |
|
| 93 |
+
| Q8 | ~35GB | 2x RTX 4090 |
|
| 94 |
+
| **Q4_K_M** | **~20GB** | **RTX 4090** |
|
| 95 |
+
| Q3_K_M | ~15GB | RTX 4080 |
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
## CREDITS
|
| 99 |
+
- **I WOULD LIKE TO SINCERELY APOLOGISE TO EGANAI AS EARLIER I FAILED TO PROPERLY ACCREDIT THEM THIS MODEL HAS BEEN SOURCED FROM THEM AND IS A REUPLOAD**
|
| 100 |
+
## License
|
| 101 |
+
|
| 102 |
+
MIT
|
banner.jpeg
ADDED
|
Git LFS Details
|