File size: 9,140 Bytes
ac75d74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
# Model Card: Galena-2B (Granite 3.3 Math & Physics)

## Model Description

**Galena-2B** is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.

- **Developed by:** [Your Name/Organization]
- **Base Model:** [IBM Granite 3.3-2B Instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)
- **Model Type:** Causal Language Model (Decoder-only Transformer)
- **Language:** English
- **License:** Apache 2.0
- **Fine-tuned from:** ibm-granite/granite-3.3-2b-instruct

## Model Architecture

- **Architecture:** GraniteForCausalLM
- **Parameters:** 2.0B
- **Layers:** 40
- **Hidden Size:** 2048
- **Attention Heads:** 32 (query) / 8 (key-value, GQA)
- **Intermediate Size:** 8192
- **Vocabulary Size:** 49,159 tokens
- **Context Window:** 131,072 tokens (128k)
- **Precision:** bfloat16 (training & inference)
- **Activation Function:** SiLU (Swish)

### Key Features

- **Grouped Query Attention (GQA)** for efficient inference
- **RoPE Embeddings** with extended context support (theta=10M)
- **Attention & Logits Scaling** for training stability
- **Embedding Multiplier** (12.0) and Residual Multiplier (0.22)

## Intended Use

### Primary Use Cases

- **Educational Applications:** Teaching and learning advanced mathematics and physics
- **Research Tools:** Assisting with physics problem formulation and mathematical reasoning
- **Conversational AI:** Domain-specific chatbots for STEM topics
- **Tool-Augmented Reasoning:** Integration with calculators and symbolic math engines

### Out-of-Scope Use

- **Critical Decision Making:** Not suitable for medical, legal, or safety-critical applications
- **General-Purpose Conversational AI:** Optimized for math/physics; may underperform on general topics
- **Production Systems:** This is a research/educational model without production guarantees
- **Factual Information Retrieval:** May hallucinate; always verify outputs

## Training Data

The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:

### 1. NVIDIA Nemotron-RL-Math (Advanced Calculations)

- **Source:** `nvidia/Nemotron-RL-math-advanced_calculations`
- **Content:** Complex mathematical problems with step-by-step reasoning traces
- **Features:** Tool-augmented reasoning, calculator integration, multi-step problem decomposition
- **Format:** Instruction-following with detailed solution traces

### 2. CAMEL-AI Physics Dataset

- **Source:** `camel-ai/physics`
- **Content:** Physics dialogue pairs covering diverse topics and subtopics
- **Features:** Conceptual explanations, problem-solving, physics principles
- **Metadata:** Topic and subtopic categorization for structured learning

### Data Preparation

- **Preprocessing:** `scripts/prepare_math_physics.py` in parent GRANITE repository
- **Format Conversion:** Unified into Granite's chat format (`<|user|>`/`<|assistant|>` tags)
- **Output:** `data/math_physics.jsonl` (26k examples)
- **Token Length:** Max sequence length capped at 512 tokens during training

## Training Procedure

### Training Hyperparameters

- **Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Base Model Precision:** 4-bit quantization (NF4)
- **LoRA Rank:** Default (typically 8-16)
- **LoRA Alpha:** Default
- **Target Modules:** Query, Key, Value, Output projections
- **Gradient Checkpointing:** Enabled
- **Mixed Precision:** bfloat16

### Training Configuration

```python

{

    "base_model": "ibm-granite/granite-3.3-2b-instruct",

    "dataset_path": "data/math_physics.jsonl",

    "output_dir": "outputs/granite-math-physics-lora",

    "use_4bit": true,

    "per_device_train_batch_size": 1,

    "gradient_accumulation_steps": 4,

    "effective_batch_size": 4,

    "num_train_epochs": 1,

    "max_steps": 500,

    "max_seq_length": 512,

    "learning_rate": "2e-4 (default)",

    "batching_strategy": "padding",

    "optimizer": "paged_adamw_8bit",

    "bf16": true

}

```

### Training Infrastructure

- **Hardware:** NVIDIA GeForce RTX 4060 (8GB VRAM)
- **Software Stack:**
  - PyTorch 2.x
  - Hugging Face Transformers 4.44+
  - PEFT 0.11+
  - bitsandbytes 0.43+
  - CUDA 12.1
- **Training Time:** ~500 steps (1 epoch over 26k examples with batch size 4)
- **Checkpointing:** LoRA adapters saved every N steps

### Post-Training

1. **Adapter Merging:** LoRA adapters merged back into base weights using `scripts/merge_lora.py`
2. **GGUF Conversion:** Exported to F16 GGUF format via `llama.cpp/convert_hf_to_gguf.py`
3. **Formats Produced:**
   - Hugging Face Transformers (safetensors)
   - GGUF F16 (llama.cpp compatible)

## Evaluation

### Qualitative Assessment

The model demonstrates improved performance on:

- Multi-step mathematical reasoning
- Physics problem explanation
- Calculator-augmented computation tasks
- Domain-specific terminology and notation

### Limitations

- **Limited Training Steps:** Only 500 training steps; longer training may improve performance
- **Domain Specialization:** May sacrifice general capabilities for math/physics expertise
- **Hallucination Risk:** Can generate plausible but incorrect solutions
- **Tool Integration:** Expects calculator tools in reasoning traces; standalone performance may vary
- **Context Window:** Fine-tuned on 512-token sequences; full 128k context not extensively tested

## Bias, Risks, and Limitations

### Known Limitations

1. **Domain Specificity:** Optimized for math/physics; general knowledge may be limited
2. **Factual Accuracy:** No guarantee of correctness; outputs should be verified
3. **Training Data Bias:** Inherits biases from Nemotron and CAMEL-AI datasets
4. **Base Model Limitations:** Retains all limitations of Granite 3.3-2B Instruct
5. **Small Training Set:** 26k examples may not cover all edge cases

### Ethical Considerations

- **Educational Use:** Should supplement, not replace, human instruction
- **Verification Required:** Always validate mathematical and scientific outputs
- **Accessibility:** May use technical jargon inaccessible to beginners
- **Dataset Provenance:** Users should review source dataset licenses and terms

### Recommendations

- Use as an educational aid, not a source of truth
- Implement output validation for critical applications
- Combine with symbolic computation tools for verification
- Monitor for hallucinations and incorrect reasoning
- Consider fine-tuning on domain-specific data for production use

## Environmental Impact

- **Hardware:** NVIDIA RTX 4060 (8GB VRAM)
- **Training Duration:** ~500 steps (estimated 1-2 hours)
- **Energy Consumption:** Estimated <1 kWh for training
- **Carbon Footprint:** Minimal due to efficient LoRA training

## Technical Specifications

### Model Formats

| Format | Precision | Size | Compatible Frameworks |
|--------|-----------|------|-----------------------|
| Hugging Face Transformers | bfloat16 | ~5.0 GB | PyTorch, Transformers, vLLM, TGI |
| GGUF F16 | float16 | ~4.7 GB | llama.cpp, Ollama, LM Studio |

### System Requirements

**Minimum (CPU Inference):**
- RAM: 8 GB
- Storage: 10 GB free space
- CPU: Modern x86-64 with AVX2 support

**Recommended (GPU Inference):**
- GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
- RAM: 16 GB
- CUDA 12.1+ or ROCm 5.7+

### Loading & Inference

Before running inference, pull the artifacts into `models/math-physics/`:

```bash

python scripts/download_artifacts.py --artifact all

```

**Transformers (Python):**
```python

from transformers import AutoModelForCausalLM, AutoTokenizer



model = AutoModelForCausalLM.from_pretrained(

    "models/math-physics/hf",

    device_map="auto",

    trust_remote_code=True

)

tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")

```

**llama.cpp (Command Line):**
```bash

./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256

```

## Citation

```bibtex

@software{galena_2b_2024,

  title = {Galena-2B: Granite 3.3 Math & Physics Model},

  author = {Your Name},

  year = {2024},

  url = {https://github.com/yourusername/galena-2B},

  note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}

}

```

## Acknowledgments

- IBM Research for the Granite 3.3 foundation model
- NVIDIA for the Nemotron-RL-Math dataset
- CAMEL-AI for the physics dialogue dataset
- Hugging Face for training infrastructure and libraries

## Contact

For questions, issues, or contributions:
- **Repository:** [GitHub Issues](https://github.com/yourusername/galena-2B/issues)
- **Email:** your.email@example.com

## Changelog

### Version 1.0 (2024-11-17)

- Initial release
- Fine-tuned on 26k math/physics examples
- 500 training steps with QLoRA
- Hugging Face and GGUF formats released