Gemma2-9B-BPMG-IT

Gemma2-9B-BPMG-IT is an instruction-tuned language model that converts natural-language business process descriptions into BPMN models rendered in Graphviz DOT. It is a LoRA adaptation of google/gemma-2-9b-it, trained on a cleaned subset of the MaD dataset for the paper:

Generating Business Process Models with Open Source Large Language Models using Instruction Tuning. Gökberk Çelikmasat, Atay Özgövde, Fatma Başak Aydemir. International Conference on Product-Focused Software Process Improvement (PROFES 2025), Springer LNCS, pp. 269–284. DOI: 10.1007/978-3-032-12089-2_17

This model is the subject of our PROFES 2025 conference paper, in which we introduced the instruction-tuning approach and evaluated it against open-weight and proprietary baselines on textual and structural metrics. It is also the BPMG-IT baseline in our subsequent journal paper:

InstruBPM: Instruction-Tuning Open-Weight Language Models for BPMN Model Generation. Çelikmasat, Özgövde, Aydemir. Software and Systems Modeling, under review, 2026. arXiv: 2512.12063

For new projects we recommend the successor model, gcelikmasat-work/Qwen3_4B_BPMN_IT, which matches this model's accuracy with roughly half the parameter count (4B vs. 9B) and ships with quantized and merge-scale variants for deployment trade-offs.

Results

Evaluated on the 180-instance stratified benchmark used in the InstruBPM journal paper (Table 2), this model attains:

Metric Score
BLEU 82.98
ROUGE-L 94.61
METEOR 92.67
R-GED Acc. 97.78

These scores are very close to those of the newer 4B Qwen3 successor (which reaches 83.06 / 94.43 / 92.82 / 99.44 on the same benchmark), while requiring more than twice the memory and compute at inference time.

Intended use

Generate first-draft BPMN models from textual process descriptions to accelerate early-stage modeling. Intended as an assistant for business process modelers and analysts; human review remains recommended, particularly for gateway logic and activity labels.

Supported BPMN subset

The model generates BPMN process fragments in DOT notation covering: start events, end events, tasks (activities), sequence flows, and AND/XOR gateways (splits and joins). It does not generate pools, lanes, message flows, data objects, intermediate/boundary events, sub-processes, or annotations.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "gcelikmasat-work/gemma-2-9b-it-BPMN"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

instruction = (
    "You are an expert in BPMN modeling and DOT language. Your task is to "
    "convert detailed textual descriptions of business processes into accurate "
    "BPMN model codes written in DOT language. Label all nodes with their "
    "activity names. Represent all connections between nodes without labeling "
    "the connections. Represent each node and its connections accurately, "
    "ensuring all decision points and flows are included and connected. "
    "Now, generate BPMN business process model code in DOT language for the "
    "following textual description of a business process: "
)

description = (
    "The process begins when the customer submits an application. After submission, "
    "the application is reviewed by the credit officer. If the application is approved, "
    "the loan is disbursed. Otherwise, a rejection letter is sent. The process ends."
)

# Gemma 2 uses a single user turn without a separate system role
messages = [{"role": "user", "content": instruction + description}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=2048, temperature=0.1, top_p=1.0, do_sample=True)

dot_code = tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(dot_code)

Training

Trained with LLaMA-Factory using LoRA on Gemma-2 9B Instruct. Detailed hyperparameters are reported in the PROFES 2025 paper. Training data: 21.5k cleaned instruction–input–output triples from MaD, split 80/10/10 for train/validation/test. The full splits are available at gcelikmasat-work/BPMN-IT-Dataset.

Limitations

  • Scope. Control-flow slice of BPMN only (tasks, events, sequence flows, AND/XOR gateways). No pools, lanes, message flows, data objects, or sub-processes.
  • Language. English only.
  • Parameter efficiency. At 9B parameters, this model is roughly twice the size of the Qwen3-4B successor for comparable accuracy. For deployment-constrained settings, the 4B successor is preferred.
  • Semantic equivalence. Structural similarity does not imply semantic equivalence, especially when input descriptions are ambiguous.

Citation

If you use this model, please cite the PROFES 2025 paper:

@inproceedings{celikmasat2025bpmg,
  title     = {Generating Business Process Models with Open Source Large Language Models using Instruction Tuning},
  author    = {{\c{C}}elikmasat, G{\"o}kberk and {\"O}zg{\"o}vde, Atay and Aydemir, Fatma Ba{\c{s}}ak},
  booktitle = {Product-Focused Software Process Improvement (PROFES 2025)},
  series    = {Lecture Notes in Computer Science},
  pages     = {269--284},
  year      = {2025},
  publisher = {Springer},
  doi       = {10.1007/978-3-032-12089-2_17},
  url       = {https://doi.org/10.1007/978-3-032-12089-2_17}
}

If you are comparing against this model as a baseline in a follow-up study, please also cite the journal extension:

@article{celikmasat2026instrubpm,
  title   = {InstruBPM: Instruction-Tuning Open-Weight Language Models for BPMN Model Generation},
  author  = {{\c{C}}elikmasat, G{\"o}kberk and {\"O}zg{\"o}vde, Atay and Aydemir, Fatma Ba{\c{s}}ak},
  journal = {Software and Systems Modeling},
  year    = {2026},
  note    = {Under review. arXiv:2512.12063},
  url     = {https://arxiv.org/abs/2512.12063}
}

Please also cite the source dataset:

@inproceedings{li2023mad,
  title     = {{MaD}: A Dataset for Interview-based {BPM} in Business Process Management},
  author    = {Li, Xiang and Ni, Lijuan and Li, Ran and Liu, Jiafei and Zhang, Ming},
  booktitle = {2023 International Joint Conference on Neural Networks (IJCNN)},
  pages     = {1--8},
  year      = {2023},
  publisher = {IEEE}
}

License

Released under the Gemma license, inherited from the base model (google/gemma-2-9b-it). Use is subject to Google's Gemma Prohibited Use Policy. The training data is distributed separately under the terms of the MaD dataset.

Downloads last month
530
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gcelikmasat-work/gemma-2-9b-it-BPMN

Adapter
(301)
this model

Dataset used to train gcelikmasat-work/gemma-2-9b-it-BPMN

Collection including gcelikmasat-work/gemma-2-9b-it-BPMN

Paper for gcelikmasat-work/gemma-2-9b-it-BPMN

Evaluation results

  • BLEU on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
    self-reported
    82.980
  • ROUGE-L on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
    self-reported
    94.610
  • METEOR on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
    self-reported
    92.670
  • R-GED Accuracy (%) on BPMN-IT (stratified 180-instance benchmark across 15 business domains, seed split)
    self-reported
    97.780