Instructions to use skilledu/Mellum2-12B-A2.5B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use skilledu/Mellum2-12B-A2.5B-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="skilledu/Mellum2-12B-A2.5B-Base")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("skilledu/Mellum2-12B-A2.5B-Base") model = AutoModelForMultimodalLM.from_pretrained("skilledu/Mellum2-12B-A2.5B-Base") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use skilledu/Mellum2-12B-A2.5B-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "skilledu/Mellum2-12B-A2.5B-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "skilledu/Mellum2-12B-A2.5B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/skilledu/Mellum2-12B-A2.5B-Base
- SGLang
How to use skilledu/Mellum2-12B-A2.5B-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "skilledu/Mellum2-12B-A2.5B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "skilledu/Mellum2-12B-A2.5B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "skilledu/Mellum2-12B-A2.5B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "skilledu/Mellum2-12B-A2.5B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use skilledu/Mellum2-12B-A2.5B-Base with Docker Model Runner:
docker model run hf.co/skilledu/Mellum2-12B-A2.5B-Base
library_name: transformers
language:
- en
model-index:
- name: Mellum2 Base
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: humaneval
name: HumanEval
metrics:
- name: pass@1
type: pass@1
value: 41.46
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: humaneval_plus
name: HumanEval+
metrics:
- name: pass@1
type: pass@1
value: 37.2
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: mbpp
name: MBPP
metrics:
- name: pass@1
type: pass@1
value: 62.4
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: mbpp_plus
name: MBPP+
metrics:
- name: pass@1
type: pass@1
value: 78.31
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: multipl-e
name: MultiPL-E HumanEval, 7 languages
metrics:
- name: pass@1
type: pass@1
value: 20.97
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: cruxeval
name: CRUXEval-I
metrics:
- name: pass@1
type: pass@1
value: 45.38
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: cruxeval
name: CRUXEval-O
metrics:
- name: pass@1
type: pass@1
value: 43.88
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: cais/mmlu
name: MMLU
metrics:
- name: accuracy
type: acc
value: 70.87
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: mmlu-pro
name: MMLU-Pro
metrics:
- name: exact match
type: exact_match
value: 59.31
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: bbh
name: BBH
metrics:
- name: exact match
type: exact_match
value: 74.9
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: ai2_arc
name: ARC-Challenge
metrics:
- name: normalized accuracy
type: acc_norm
value: 53.5
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: hellaswag
name: HellaSwag
metrics:
- name: normalized accuracy
type: acc_norm
value: 73.72
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: winogrande
name: WinoGrande
metrics:
- name: accuracy
type: acc
value: 65.51
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: truthful_qa
name: TruthfulQA MC2
metrics:
- name: MC2
type: mc2
value: 44.51
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: gsm8k
name: GSM8K
metrics:
- name: exact match
type: exact_match
value: 81.73
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: hendrycks_math
name: MATH
metrics:
- name: exact match
type: exact_match
value: 9.96
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: gpqa
name: GPQA Diamond
metrics:
- name: accuracy
type: acc
value: 31.31
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
type: gpqa
name: GPQA Main
metrics:
- name: accuracy
type: acc
value: 35.04
verified: false
license: apache-2.0
Mellum2 Base
Use this checkpoint as the starting point for your own fine-tuning, alignment, or domain adaptation on top of the long-context base. For instruction-following or reasoning tasks out of the box, use Instruct or Thinking instead.
Mellum2 Base Highlights
Mellum2 Base is a long-context pretrained causal language model trained by JetBrains.
The model uses a Mixture-of-Experts architecture with 64 experts and activates 8 experts per token. It uses a combination of sliding-window and full attention layers, with a context length of 131,072 tokens.
This is the long-context base, produced from Mellum2-12B-A2.5B-Base-Pretrain by a layer-selective YaRN extension stage that re-maps RoPE frequencies on the global-attention layers only. It is the shared starting point for the released Instruct and Thinking variants.
Mellum2 Model Family
This repository contains one checkpoint from the Mellum2 family.
| Checkpoint | Description |
|---|---|
| Base Pretrain | Base checkpoint before long-context extension |
| Base | Final base model |
| Instruct SFT | Supervised instruction-tuned checkpoint |
| Thinking SFT | Supervised thinking checkpoint |
| Instruct | RL-tuned instruction model |
| Thinking | RL-tuned thinking model |
Model Overview
Mellum2 Base has the following features:
- Number of Layers: 28
- Hidden Size: 2304
- Intermediate Size: 7168
- MoE Intermediate Size: 896
- Number of Experts: 64
- Number of Activated Experts: 8
- Number of Attention Heads (GQA): 32 for Q and 4 for KV
- Context Length: 131,072
- Sliding Window: 1,024
- Vocabulary Size: 98,304
- Precision: bfloat16
Serving with vLLM
vllm serve JetBrains/Mellum2-12B-A2.5B-Base --max-model-len 131072
Quickstart
Text-Only Input (base model — use the completions endpoint, not chat)
from openai import OpenAI
# Configured by environment variables
client = OpenAI()
completion = client.completions.create(
model="JetBrains/Mellum2-12B-A2.5B-Base",
prompt="def fibonacci(n):\n ",
max_tokens=81920,
temperature=0.6,
top_p=0.95,
extra_body={
"top_k": 20,
},
)
print("Completion:", completion)
Evaluation
Mellum2 Base pretraining results compared with similarly-sized open base models. All values are self-reported by JetBrains.
| Benchmark | Mellum2 (12B-A2.5B) | OLMo-3 (7B) | Qwen2.5 (7B) | Qwen3 (4B) | Qwen3.5 (4B) |
|---|---|---|---|---|---|
| Code Generation | |||||
| HumanEval | 41.5 | 45.1 | 55.5 | 57.3 | 50.0 |
| HumanEval+ | 37.2 | 39.6 | 47.0 | 51.2 | 43.9 |
| MBPP | 62.4 | 50.6 | 63.6 | 67.0 | 52.2 |
| MBPP+ | 61.4 | 52.9 | 64.0 | 64.5 | 55.0 |
| MultiPL-E (7 langs) | 21.0 | 10.0 | 19.2 | 26.0 | 12.1 |
| CRUXEval-I | 45.4 | 38.8 | 44.0 | 44.6 | 49.1 |
| CRUXEval-O | 43.9 | 36.6 | 42.9 | 43.5 | 43.2 |
| Knowledge & Reasoning | |||||
| MMLU | 70.9 | 62.1 | 71.8 | 71.1 | 74.2 |
| MMLU-Pro | 59.3 | 34.5 | 48.6 | 51.5 | 52.4 |
| BBH | 74.9 | 63.6 | 69.0 | 71.3 | 80.2 |
| ARC-Challenge | 53.5 | 53.6 | 51.3 | 51.2 | 54.9 |
| HellaSwag | 73.7 | 74.2 | 78.9 | 73.7 | 75.3 |
| WinoGrande | 65.5 | 69.5 | 73.3 | 71.2 | 70.8 |
| TruthfulQA MC2 | 44.5 | 47.0 | 56.4 | 53.5 | 52.1 |
| Math & Science | |||||
| GSM8K | 81.7 | 73.5 | 81.9 | 82.0 | 80.1 |
| MATH | 10.0 | 18.7 | 24.6 | 27.7 | 25.3 |
| GPQA Diamond | 31.3 | 28.8 | 32.8 | 36.9 | 41.4 |
| GPQA Main | 35.0 | 27.9 | 34.2 | 36.8 | 40.2 |
For more details, see the Mellum2 Technical Report.
License
Released under the Apache 2.0 license.