Instructions to use skilledu/Mellum2-12B-A2.5B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use skilledu/Mellum2-12B-A2.5B-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="skilledu/Mellum2-12B-A2.5B-Base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("skilledu/Mellum2-12B-A2.5B-Base")
model = AutoModelForMultimodalLM.from_pretrained("skilledu/Mellum2-12B-A2.5B-Base")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use skilledu/Mellum2-12B-A2.5B-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "skilledu/Mellum2-12B-A2.5B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skilledu/Mellum2-12B-A2.5B-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/skilledu/Mellum2-12B-A2.5B-Base

SGLang

How to use skilledu/Mellum2-12B-A2.5B-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "skilledu/Mellum2-12B-A2.5B-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skilledu/Mellum2-12B-A2.5B-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "skilledu/Mellum2-12B-A2.5B-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skilledu/Mellum2-12B-A2.5B-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use skilledu/Mellum2-12B-A2.5B-Base with Docker Model Runner:
```
docker model run hf.co/skilledu/Mellum2-12B-A2.5B-Base
```

Mellum2-12B-A2.5B-Base / README.md

skilledu

Duplicate from JetBrains/Mellum2-12B-A2.5B-Base

dbbd253 12 days ago

preview code

raw

history blame contribute delete

9.62 kB

metadata

library_name: transformers
language:
  - en
model-index:
  - name: Mellum2 Base
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: humaneval
          name: HumanEval
        metrics:
          - name: pass@1
            type: pass@1
            value: 41.46
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: humaneval_plus
          name: HumanEval+
        metrics:
          - name: pass@1
            type: pass@1
            value: 37.2
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: mbpp
          name: MBPP
        metrics:
          - name: pass@1
            type: pass@1
            value: 62.4
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: mbpp_plus
          name: MBPP+
        metrics:
          - name: pass@1
            type: pass@1
            value: 78.31
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: multipl-e
          name: MultiPL-E HumanEval, 7 languages
        metrics:
          - name: pass@1
            type: pass@1
            value: 20.97
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: cruxeval
          name: CRUXEval-I
        metrics:
          - name: pass@1
            type: pass@1
            value: 45.38
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: cruxeval
          name: CRUXEval-O
        metrics:
          - name: pass@1
            type: pass@1
            value: 43.88
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: cais/mmlu
          name: MMLU
        metrics:
          - name: accuracy
            type: acc
            value: 70.87
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: mmlu-pro
          name: MMLU-Pro
        metrics:
          - name: exact match
            type: exact_match
            value: 59.31
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: bbh
          name: BBH
        metrics:
          - name: exact match
            type: exact_match
            value: 74.9
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: ai2_arc
          name: ARC-Challenge
        metrics:
          - name: normalized accuracy
            type: acc_norm
            value: 53.5
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: hellaswag
          name: HellaSwag
        metrics:
          - name: normalized accuracy
            type: acc_norm
            value: 73.72
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: winogrande
          name: WinoGrande
        metrics:
          - name: accuracy
            type: acc
            value: 65.51
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: truthful_qa
          name: TruthfulQA MC2
        metrics:
          - name: MC2
            type: mc2
            value: 44.51
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: gsm8k
          name: GSM8K
        metrics:
          - name: exact match
            type: exact_match
            value: 81.73
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: hendrycks_math
          name: MATH
        metrics:
          - name: exact match
            type: exact_match
            value: 9.96
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: gpqa
          name: GPQA Diamond
        metrics:
          - name: accuracy
            type: acc
            value: 31.31
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: gpqa
          name: GPQA Main
        metrics:
          - name: accuracy
            type: acc
            value: 35.04
            verified: false
license: apache-2.0

Mellum2 Base

Use this checkpoint as the starting point for your own fine-tuning, alignment, or domain adaptation on top of the long-context base. For instruction-following or reasoning tasks out of the box, use Instruct or Thinking instead.

Mellum2 Base Highlights

Mellum2 Base is a long-context pretrained causal language model trained by JetBrains.

The model uses a Mixture-of-Experts architecture with 64 experts and activates 8 experts per token. It uses a combination of sliding-window and full attention layers, with a context length of 131,072 tokens.

This is the long-context base, produced from Mellum2-12B-A2.5B-Base-Pretrain by a layer-selective YaRN extension stage that re-maps RoPE frequencies on the global-attention layers only. It is the shared starting point for the released Instruct and Thinking variants.

Mellum2 Model Family

This repository contains one checkpoint from the Mellum2 family.

Checkpoint	Description
Base Pretrain	Base checkpoint before long-context extension
Base	Final base model
Instruct SFT	Supervised instruction-tuned checkpoint
Thinking SFT	Supervised thinking checkpoint
Instruct	RL-tuned instruction model
Thinking	RL-tuned thinking model

Model Overview

Mellum2 Base has the following features:

Number of Layers: 28
Hidden Size: 2304
Intermediate Size: 7168
MoE Intermediate Size: 896
Number of Experts: 64
Number of Activated Experts: 8
Number of Attention Heads (GQA): 32 for Q and 4 for KV
Context Length: 131,072
Sliding Window: 1,024
Vocabulary Size: 98,304
Precision: bfloat16

Serving with vLLM

vllm serve JetBrains/Mellum2-12B-A2.5B-Base --max-model-len 131072

Quickstart

Text-Only Input (base model — use the completions endpoint, not chat)

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

completion = client.completions.create(
    model="JetBrains/Mellum2-12B-A2.5B-Base",
    prompt="def fibonacci(n):\n    ",
    max_tokens=81920,
    temperature=0.6,
    top_p=0.95,
    extra_body={
        "top_k": 20,
    },
)
print("Completion:", completion)

Evaluation

Mellum2 Base pretraining results compared with similarly-sized open base models. All values are self-reported by JetBrains.

Benchmark	Mellum2 (12B-A2.5B)	OLMo-3 (7B)	Qwen2.5 (7B)	Qwen3 (4B)	Qwen3.5 (4B)
Code Generation
HumanEval	41.5	45.1	55.5	57.3	50.0
HumanEval+	37.2	39.6	47.0	51.2	43.9
MBPP	62.4	50.6	63.6	67.0	52.2
MBPP+	61.4	52.9	64.0	64.5	55.0
MultiPL-E (7 langs)	21.0	10.0	19.2	26.0	12.1
CRUXEval-I	45.4	38.8	44.0	44.6	49.1
CRUXEval-O	43.9	36.6	42.9	43.5	43.2
Knowledge & Reasoning
MMLU	70.9	62.1	71.8	71.1	74.2
MMLU-Pro	59.3	34.5	48.6	51.5	52.4
BBH	74.9	63.6	69.0	71.3	80.2
ARC-Challenge	53.5	53.6	51.3	51.2	54.9
HellaSwag	73.7	74.2	78.9	73.7	75.3
WinoGrande	65.5	69.5	73.3	71.2	70.8
TruthfulQA MC2	44.5	47.0	56.4	53.5	52.1
Math & Science
GSM8K	81.7	73.5	81.9	82.0	80.1
MATH	10.0	18.7	24.6	27.7	25.3
GPQA Diamond	31.3	28.8	32.8	36.9	41.4
GPQA Main	35.0	27.9	34.2	36.8	40.2

For more details, see the Mellum2 Technical Report.

License

Released under the Apache 2.0 license.