Gemma2-9B-BPMG-IT

Gemma2-9B-BPMG-IT is an instruction-tuned language model that converts natural-language business process descriptions into BPMN models rendered in Graphviz DOT. It is a LoRA adaptation of google/gemma-2-9b-it, trained on a cleaned subset of the MaD dataset for the paper:

Generating Business Process Models with Open Source Large Language Models using Instruction Tuning. Gökberk Çelikmasat, Atay Özgövde, Fatma Başak Aydemir. International Conference on Product-Focused Software Process Improvement (PROFES 2025), Springer LNCS, pp. 269–284. DOI: 10.1007/978-3-032-12089-2_17

This model is the subject of our PROFES 2025 conference paper, in which we introduced the instruction-tuning approach and evaluated it against open-weight and proprietary baselines on textual and structural metrics. It is also the BPMG-IT baseline in our subsequent journal paper:

InstruBPM: Instruction-Tuning Open-Weight Language Models for BPMN Model Generation. Çelikmasat, Özgövde, Aydemir. Software and Systems Modeling, under review, 2026. arXiv: 2512.12063

For new projects we recommend the successor model, gcelikmasat-work/Qwen3_4B_BPMN_IT, which matches this model's accuracy with roughly half the parameter count (4B vs. 9B) and ships with quantized and merge-scale variants for deployment trade-offs.

Results

Evaluated on the 180-instance stratified benchmark used in the InstruBPM journal paper (Table 2), this model attains:

Metric	Score
BLEU	82.98
ROUGE-L	94.61
METEOR	92.67
R-GED Acc.	97.78

These scores are very close to those of the newer 4B Qwen3 successor (which reaches 83.06 / 94.43 / 92.82 / 99.44 on the same benchmark), while requiring more than twice the memory and compute at inference time.

Intended use

Generate first-draft BPMN models from textual process descriptions to accelerate early-stage modeling. Intended as an assistant for business process modelers and analysts; human review remains recommended, particularly for gateway logic and activity labels.

Supported BPMN subset

The model generates BPMN process fragments in DOT notation covering: start events, end events, tasks (activities), sequence flows, and AND/XOR gateways (splits and joins). It does not generate pools, lanes, message flows, data objects, intermediate/boundary events, sub-processes, or annotations.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "gcelikmasat-work/gemma-2-9b-it-BPMN"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

instruction = (
    "You are an expert in BPMN modeling and DOT language. Your task is to "
    "convert detailed textual descriptions of business processes into accurate "
    "BPMN model codes written in DOT language. Label all nodes with their "
    "activity names. Represent all connections between nodes without labeling "
    "the connections. Represent each node and its connections accurately, "
    "ensuring all decision points and flows are included and connected. "
    "Now, generate BPMN business process model code in DOT language for the "
    "following textual description of a business process: "
)

description = (
    "The process begins when the customer submits an application. After submission, "
    "the application is reviewed by the credit officer. If the application is approved, "
    "the loan is disbursed. Otherwise, a rejection letter is sent. The process ends."
)

# Gemma 2 uses a single user turn without a separate system role
messages = [{"role": "user", "content": instruction + description}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=2048, temperature=0.1, top_p=1.0, do_sample=True)

dot_code = tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(dot_code)

Training

Trained with LLaMA-Factory using LoRA on Gemma-2 9B Instruct. Detailed hyperparameters are reported in the PROFES 2025 paper. Training data: 21.5k cleaned instruction–input–output triples from MaD, split 80/10/10 for train/validation/test. The full splits are available at gcelikmasat-work/BPMN-IT-Dataset.

Limitations

Scope. Control-flow slice of BPMN only (tasks, events, sequence flows, AND/XOR gateways). No pools, lanes, message flows, data objects, or sub-processes.
Language. English only.
Parameter efficiency. At 9B parameters, this model is roughly twice the size of the Qwen3-4B successor for comparable accuracy. For deployment-constrained settings, the 4B successor is preferred.
Semantic equivalence. Structural similarity does not imply semantic equivalence, especially when input descriptions are ambiguous.

Citation

If you use this model, please cite the PROFES 2025 paper:

@inproceedings{celikmasat2025bpmg,
  title     = {Generating Business Process Models with Open Source Large Language Models using Instruction Tuning},
  author    = {{\c{C}}elikmasat, G{\"o}kberk and {\"O}zg{\"o}vde, Atay and Aydemir, Fatma Ba{\c{s}}ak},
  booktitle = {Product-Focused Software Process Improvement (PROFES 2025)},
  series    = {Lecture Notes in Computer Science},
  pages     = {269--284},
  year      = {2025},
  publisher = {Springer},
  doi       = {10.1007/978-3-032-12089-2_17},
  url       = {https://doi.org/10.1007/978-3-032-12089-2_17}
}

If you are comparing against this model as a baseline in a follow-up study, please also cite the journal extension:

@article{celikmasat2026instrubpm,
  title   = {InstruBPM: Instruction-Tuning Open-Weight Language Models for BPMN Model Generation},
  author  = {{\c{C}}elikmasat, G{\"o}kberk and {\"O}zg{\"o}vde, Atay and Aydemir, Fatma Ba{\c{s}}ak},
  journal = {Software and Systems Modeling},
  year    = {2026},
  note    = {Under review. arXiv:2512.12063},
  url     = {https://arxiv.org/abs/2512.12063}
}

Please also cite the source dataset:

@inproceedings{li2023mad,
  title     = {{MaD}: A Dataset for Interview-based {BPM} in Business Process Management},
  author    = {Li, Xiang and Ni, Lijuan and Li, Ran and Liu, Jiafei and Zhang, Ming},
  booktitle = {2023 International Joint Conference on Neural Networks (IJCNN)},
  pages     = {1--8},
  year      = {2023},
  publisher = {IEEE}
}

License

Released under the Gemma license, inherited from the base model (google/gemma-2-9b-it). Use is subject to Google's Gemma Prohibited Use Policy. The training data is distributed separately under the terms of the MaD dataset.

Downloads last month: 530

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for gcelikmasat-work/gemma-2-9b-it-BPMN

Base model

google/gemma-2-9b

Finetuned

google/gemma-2-9b-it

Adapter

(301)

this model

Dataset used to train gcelikmasat-work/gemma-2-9b-it-BPMN

Collection including gcelikmasat-work/gemma-2-9b-it-BPMN

Text2BPMN_Trained_Models

Collection

Instruction tuned LLMs for BPMN model generation using MaD dataset. • 2 items • Updated 9 days ago

Paper for gcelikmasat-work/gemma-2-9b-it-BPMN

Instruction-Tuning Open-Weight Language Models for BPMN Model Generation

Paper • 2512.12063 • Published Dec 12, 2025

Evaluation results

BLEU on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
self-reported

82.980
ROUGE-L on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
self-reported

94.610
METEOR on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
self-reported

92.670
R-GED Accuracy (%) on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
self-reported

97.780