Instructions to use xJoePec/galena-2b-math-physics with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use xJoePec/galena-2b-math-physics with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="xJoePec/galena-2b-math-physics")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("xJoePec/galena-2b-math-physics", dtype="auto")

llama-cpp-python

How to use xJoePec/galena-2b-math-physics with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="xJoePec/galena-2b-math-physics",
	filename="gguf/granite-math-physics-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use xJoePec/galena-2b-math-physics with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf xJoePec/galena-2b-math-physics:F16
# Run inference directly in the terminal:
llama-cli -hf xJoePec/galena-2b-math-physics:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf xJoePec/galena-2b-math-physics:F16
# Run inference directly in the terminal:
llama-cli -hf xJoePec/galena-2b-math-physics:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf xJoePec/galena-2b-math-physics:F16
# Run inference directly in the terminal:
./llama-cli -hf xJoePec/galena-2b-math-physics:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf xJoePec/galena-2b-math-physics:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf xJoePec/galena-2b-math-physics:F16

Use Docker

docker model run hf.co/xJoePec/galena-2b-math-physics:F16

LM Studio
Jan

vLLM

How to use xJoePec/galena-2b-math-physics with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "xJoePec/galena-2b-math-physics"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xJoePec/galena-2b-math-physics",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/xJoePec/galena-2b-math-physics:F16

SGLang

How to use xJoePec/galena-2b-math-physics with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "xJoePec/galena-2b-math-physics" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xJoePec/galena-2b-math-physics",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "xJoePec/galena-2b-math-physics" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xJoePec/galena-2b-math-physics",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use xJoePec/galena-2b-math-physics with Ollama:
```
ollama run hf.co/xJoePec/galena-2b-math-physics:F16
```

Unsloth Studio

How to use xJoePec/galena-2b-math-physics with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for xJoePec/galena-2b-math-physics to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for xJoePec/galena-2b-math-physics to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for xJoePec/galena-2b-math-physics to start chatting

How to use xJoePec/galena-2b-math-physics with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf xJoePec/galena-2b-math-physics:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "xJoePec/galena-2b-math-physics:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use xJoePec/galena-2b-math-physics with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf xJoePec/galena-2b-math-physics:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default xJoePec/galena-2b-math-physics:F16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use xJoePec/galena-2b-math-physics with Docker Model Runner:
```
docker model run hf.co/xJoePec/galena-2b-math-physics:F16
```

Lemonade

How to use xJoePec/galena-2b-math-physics with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull xJoePec/galena-2b-math-physics:F16

Run and chat with the model

lemonade run user.galena-2b-math-physics-F16

List all available models

lemonade list

galena-2b-math-physics

File size: 9,140 Bytes

ac75d74

# Model Card: Galena-2B (Granite 3.3 Math & Physics)

## Model Description

**Galena-2B** is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.

- **Developed by:** [Your Name/Organization]
- **Base Model:** [IBM Granite 3.3-2B Instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)
- **Model Type:** Causal Language Model (Decoder-only Transformer)
- **Language:** English
- **License:** Apache 2.0
- **Fine-tuned from:** ibm-granite/granite-3.3-2b-instruct

## Model Architecture

- **Architecture:** GraniteForCausalLM
- **Parameters:** 2.0B
- **Layers:** 40
- **Hidden Size:** 2048
- **Attention Heads:** 32 (query) / 8 (key-value, GQA)
- **Intermediate Size:** 8192
- **Vocabulary Size:** 49,159 tokens
- **Context Window:** 131,072 tokens (128k)
- **Precision:** bfloat16 (training & inference)
- **Activation Function:** SiLU (Swish)

### Key Features

- **Grouped Query Attention (GQA)** for efficient inference
- **RoPE Embeddings** with extended context support (theta=10M)
- **Attention & Logits Scaling** for training stability
- **Embedding Multiplier** (12.0) and Residual Multiplier (0.22)

## Intended Use

### Primary Use Cases

- **Educational Applications:** Teaching and learning advanced mathematics and physics
- **Research Tools:** Assisting with physics problem formulation and mathematical reasoning
- **Conversational AI:** Domain-specific chatbots for STEM topics
- **Tool-Augmented Reasoning:** Integration with calculators and symbolic math engines

### Out-of-Scope Use

- **Critical Decision Making:** Not suitable for medical, legal, or safety-critical applications
- **General-Purpose Conversational AI:** Optimized for math/physics; may underperform on general topics
- **Production Systems:** This is a research/educational model without production guarantees
- **Factual Information Retrieval:** May hallucinate; always verify outputs

## Training Data

The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:

### 1. NVIDIA Nemotron-RL-Math (Advanced Calculations)

- **Source:** `nvidia/Nemotron-RL-math-advanced_calculations`
- **Content:** Complex mathematical problems with step-by-step reasoning traces
- **Features:** Tool-augmented reasoning, calculator integration, multi-step problem decomposition
- **Format:** Instruction-following with detailed solution traces

### 2. CAMEL-AI Physics Dataset

- **Source:** `camel-ai/physics`
- **Content:** Physics dialogue pairs covering diverse topics and subtopics
- **Features:** Conceptual explanations, problem-solving, physics principles
- **Metadata:** Topic and subtopic categorization for structured learning

### Data Preparation

- **Preprocessing:** `scripts/prepare_math_physics.py` in parent GRANITE repository
- **Format Conversion:** Unified into Granite's chat format (`<|user|>`/`<|assistant|>` tags)
- **Output:** `data/math_physics.jsonl` (26k examples)
- **Token Length:** Max sequence length capped at 512 tokens during training

## Training Procedure

### Training Hyperparameters

- **Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Base Model Precision:** 4-bit quantization (NF4)
- **LoRA Rank:** Default (typically 8-16)
- **LoRA Alpha:** Default
- **Target Modules:** Query, Key, Value, Output projections
- **Gradient Checkpointing:** Enabled
- **Mixed Precision:** bfloat16

### Training Configuration

```python

{

    "base_model": "ibm-granite/granite-3.3-2b-instruct",

    "dataset_path": "data/math_physics.jsonl",

    "output_dir": "outputs/granite-math-physics-lora",

    "use_4bit": true,

    "per_device_train_batch_size": 1,

    "gradient_accumulation_steps": 4,

    "effective_batch_size": 4,

    "num_train_epochs": 1,

    "max_steps": 500,

    "max_seq_length": 512,

    "learning_rate": "2e-4 (default)",

    "batching_strategy": "padding",

    "optimizer": "paged_adamw_8bit",

    "bf16": true

}

```

### Training Infrastructure

- **Hardware:** NVIDIA GeForce RTX 4060 (8GB VRAM)
- **Software Stack:**
  - PyTorch 2.x
  - Hugging Face Transformers 4.44+
  - PEFT 0.11+
  - bitsandbytes 0.43+
  - CUDA 12.1
- **Training Time:** ~500 steps (1 epoch over 26k examples with batch size 4)
- **Checkpointing:** LoRA adapters saved every N steps

### Post-Training

1. **Adapter Merging:** LoRA adapters merged back into base weights using `scripts/merge_lora.py`
2. **GGUF Conversion:** Exported to F16 GGUF format via `llama.cpp/convert_hf_to_gguf.py`
3. **Formats Produced:**
   - Hugging Face Transformers (safetensors)
   - GGUF F16 (llama.cpp compatible)

## Evaluation

### Qualitative Assessment

The model demonstrates improved performance on:

- Multi-step mathematical reasoning
- Physics problem explanation
- Calculator-augmented computation tasks
- Domain-specific terminology and notation

### Limitations

- **Limited Training Steps:** Only 500 training steps; longer training may improve performance
- **Domain Specialization:** May sacrifice general capabilities for math/physics expertise
- **Hallucination Risk:** Can generate plausible but incorrect solutions
- **Tool Integration:** Expects calculator tools in reasoning traces; standalone performance may vary
- **Context Window:** Fine-tuned on 512-token sequences; full 128k context not extensively tested

## Bias, Risks, and Limitations

### Known Limitations

1. **Domain Specificity:** Optimized for math/physics; general knowledge may be limited
2. **Factual Accuracy:** No guarantee of correctness; outputs should be verified
3. **Training Data Bias:** Inherits biases from Nemotron and CAMEL-AI datasets
4. **Base Model Limitations:** Retains all limitations of Granite 3.3-2B Instruct
5. **Small Training Set:** 26k examples may not cover all edge cases

### Ethical Considerations

- **Educational Use:** Should supplement, not replace, human instruction
- **Verification Required:** Always validate mathematical and scientific outputs
- **Accessibility:** May use technical jargon inaccessible to beginners
- **Dataset Provenance:** Users should review source dataset licenses and terms

### Recommendations

- Use as an educational aid, not a source of truth
- Implement output validation for critical applications
- Combine with symbolic computation tools for verification
- Monitor for hallucinations and incorrect reasoning
- Consider fine-tuning on domain-specific data for production use

## Environmental Impact

- **Hardware:** NVIDIA RTX 4060 (8GB VRAM)
- **Training Duration:** ~500 steps (estimated 1-2 hours)
- **Energy Consumption:** Estimated <1 kWh for training
- **Carbon Footprint:** Minimal due to efficient LoRA training

## Technical Specifications

### Model Formats

| Format | Precision | Size | Compatible Frameworks |
|--------|-----------|------|-----------------------|
| Hugging Face Transformers | bfloat16 | ~5.0 GB | PyTorch, Transformers, vLLM, TGI |
| GGUF F16 | float16 | ~4.7 GB | llama.cpp, Ollama, LM Studio |

### System Requirements

**Minimum (CPU Inference):**
- RAM: 8 GB
- Storage: 10 GB free space
- CPU: Modern x86-64 with AVX2 support

**Recommended (GPU Inference):**
- GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
- RAM: 16 GB
- CUDA 12.1+ or ROCm 5.7+

### Loading & Inference

Before running inference, pull the artifacts into `models/math-physics/`:

```bash

python scripts/download_artifacts.py --artifact all

```

**Transformers (Python):**
```python

from transformers import AutoModelForCausalLM, AutoTokenizer



model = AutoModelForCausalLM.from_pretrained(

    "models/math-physics/hf",

    device_map="auto",

    trust_remote_code=True

)

tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")

```

**llama.cpp (Command Line):**
```bash

./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256

```

## Citation

```bibtex

@software{galena_2b_2024,

  title = {Galena-2B: Granite 3.3 Math & Physics Model},

  author = {Your Name},

  year = {2024},

  url = {https://github.com/yourusername/galena-2B},

  note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}

}

```

## Acknowledgments

- IBM Research for the Granite 3.3 foundation model
- NVIDIA for the Nemotron-RL-Math dataset
- CAMEL-AI for the physics dialogue dataset
- Hugging Face for training infrastructure and libraries

## Contact

For questions, issues, or contributions:
- **Repository:** [GitHub Issues](https://github.com/yourusername/galena-2B/issues)
- **Email:** your.email@example.com

## Changelog

### Version 1.0 (2024-11-17)

- Initial release
- Fine-tuned on 26k math/physics examples
- 500 training steps with QLoRA
- Hugging Face and GGUF formats released