ThermoGFN-IF for Catalysis: A Protein Sequence Design Model Tuned with GFlowNets for Stable Protein Design and Kinetic-Aware Enzyme Engineering

Community Article Published March 10, 2026

TLDR: This post describes the ThermoGFN-IF sequence design model arch and training methodology for conditioning sequence design on kinetic parameters (catalytic activity), thermostability, and binding affinity.

ThermoGFN-IF is an ambitious vision for protein design. At its core, the project proposes taking ADFLIP/LigandMPNN—all-atom inverse folding models—and fine-tuning them with Generative Flow Networks so that they do not merely reconstruct plausible sequences, but actively sample diverse, high-value protein variants under a multi-fidelity oracle stack. In the base thermostability formulation, that stack is tri-fidelity: SPURS for cheap mutation-level stability scoring, BioEmu for fast whole-chain equilibrium diagnostics and trajectory generation, and UMA for higher-fidelity MLIP-MD rescoring with ligand awareness and binding affinity feedback. The result is a system aimed not at one brittle optimum, but at a distribution of promising candidates spanning multiple peaks in protein fitness space. The more robust version, which we focus on first in our implementation is GFlowNets style RL fine-tuning for catalytic parameters conditioning, with oracle models GraphKcat,

KcatNet,

and MMKcat,

allowing us to tune a sequence design model that is not only conditioned on molecular dynamcis trajectories or equilibrium ensembles, but also on catalytic activity.

The project explicitly leans on the central GFlowNet idea that terminal designs should be sampled in proportion to reward rather than driven toward a single maximizer, which could lead to reward-hacking (see Edward Hu's post). In the GFlowNets style RL fine-tuning, the target distribution is written as $p_\theta(x \mid B,\beta,w) \propto R_w(x \mid B)^\beta$ , which is exactly the needed formalism when protein landscapes are noisy, multimodal, and full of compensatory trade-offs such as $K_{cat}$ values, $K_{m}$ values, and $T_{m}$ values. A greedy search can become trapped in one local story. A GFlowNet can tell you there are five good stories, and that is much closer to how real protein engineering works.

The base ThermoGFN-IF architecture is already strong. ADFLIP gives the system an all-atom, ligand-aware, multi-state generator substrate, and we also utilize LigandMPNN as an ablation to ascertain how much equilibrium ensemble conditioning vs. static structure conditioning changes the outcome of training;

SPURS supplies dense mutation landscapes and explicit epistatic supervision and a rough initial thermostability screening for large batches of generated protein sequences;

BioEmu, a diffusion based model that generates molecular dynamics trajectories at orders of magnitude faster speeds than traditional MD, gives fast ensemble-level conformational screening;

Universal Model for Atoms or "UMA"" is a quantum (ωB97M-V/def2-TZVPD) accurate "Machine Learning Interatomic Potential" or MLIP. It allows for generating molecular dynamics trajectories with high quality and speeds much, much faster than classical MD. UMA adds a higher-cost but more physically grounded branch for rescoring systems within the paper’s practical atom-count setting (we restrict to small and medium sized systems for now), and also allows us to determine things such as attack, binding affinity, and better estimates of thermostability.

For explicit ligands and multimers UMA is the optimal scoring oracle.

The moment you look at enzyme engineering rather than generic fold stabilization, a glaring truth appears: stability and binding are not enough. A protein can be beautifully folded and still be catalytically mediocre. An industrial enzyme can survive heat and solvent and still turn over substrate too slowly to matter. That is where the extension to catalytic oracles becomes very exciting. By adding GraphKcat, KcatNet, and MMKcat as catalytic oracles, ThermoGFN-IF stops being merely a thermostability-and-affinity design engine and starts to look like a platform for kinetic-aware enzyme generation.

GraphKcat is the most structurally suggestive of the three. It is introduced as a deep learning framework that integrates enzyme–substrate 3D binding conformations for kinetic-parameter prediction, using Chai-1-generated enzyme–substrate complexes, a hierarchical graph network that moves from all-atom to coarse-grained representations, and a multimodal cross-attention fusion module to combine structural, sequence, substrate, and environmental information. Just as important, the authors emphasize that it can identify catalysis-critical residues and maintain robustness under low sequence similarity. In other words, GraphKcat is not just a scalar regressor; it is a pocket-aware catalytic oracle. That makes it a natural expert for conditioning on quantities tied to active-site geometry and enzyme–substrate recognition, including (K_m)-like behavior and catalytic efficiency.

KcatNet plays a different role. It is a geometric deep learning framework for genome-scale (k_{cat}) prediction that takes enzyme sequences and substrate SMILES as inputs, builds residue-level enzyme graphs with language-model-derived features and contact maps, partitions the enzyme into local regions, and then uses iterative interaction modeling between substrate fingerprints and enzyme regions to predict turnover. The paper emphasizes that KcatNet captures site-specific enzyme–substrate interaction patterns, identifies important residues, outperforms earlier baselines such as DLKcat and UniKP on its benchmark, and can discriminate wild-type from mutant catalytic behavior. Conceptually, KcatNet is the scale oracle: less about a single exquisite catalytic pocket analysis and more about broad, fast, biologically informed turnover prediction across large enzyme spaces.

MMKcat, by contrast, is the robustness oracle. Its defining move is to treat enzyme sequence and substrate as essential modalities while allowing other terms—especially reaction-product information—to be maskable during training and inference. The authors designed it specifically to address missing-modality conditions, arguing that many (k_{cat}) predictors ignore the effect of products and collapse when key inputs are unavailable. MMKcat adds a prior-guided missing-modality mechanism plus an auxiliary regularizer so that prediction remains useful even when the real world is incomplete, messy, or experimentally under-annotated. It was reported to outperform a slate of prior (k_{cat}) models on BRENDA and SABIO-RK under both complete and missing-modality settings. In a practical design loop, that makes MMKcat the fallback oracle you want when your catalytic dataset is imperfect—which, in enzyme engineering, is most of the time.

Once these three catalytic models are brought into ThermoGFN-IF, the manuscript’s existing target-conditioned design machinery becomes much more powerful. The project defines a target vector (c) and explicitly supports bounded-retry inference for designs that must satisfy requested property tolerances, where those targets are framed around thermostability and binding. But the exact same mechanism can be extended to a catalytic target vector such as

$y^\star = \big(T_m^\star,\ \Delta G_{\mathrm{bind}}^\star,\ k_{cat}^\star,\ K_m^\star,\ (k_{cat}/K_m)^\star\big).$

Now the model is not merely asked to be “more stable” or “better bound.” It is asked to hit a kinetic profile.

This is where the GFlowNet formulation becomes more than elegant math. A natural extension of the paper’s scalarized reward is something like $\substack{R(x)=\exp!\left( \lambda_{\mathrm{stab}},s_{\mathrm{stab}}(x)+ \lambda_{\mathrm{bind}},s_{\mathrm{bind}}(x)+ \lambda_{k_{cat}},s_{k_{cat}}(x)- \lambda_{K_m},s_{K_m}(x)+ \\ \lambda_{\mathrm{eff}},s_{k_{cat}/K_m}(x)- \lambda_{\mathrm{unc}},u(x)- \lambda_{\mathrm{pack}},\mathrm{upack}(x) \right),}$

where the stability terms come from SPURS, BioEmu, and UMA, while the catalytic terms are supplied by GraphKcat, KcatNet, and MMKcat, and UMA. That kind of reward does not push the system toward a single narrow optimum. It lets the model explore multiple viable compromises between foldability, binding, turnover, and substrate handling—exactly the compromise space that real enzyme engineering lives inside. The equation here is an architectural extension of the original idea, not a verbatim equation from it, but it follows directly from the paper’s reward-proportional design logic and target-conditioning scheme.

Seen this way, the oracle stack becomes beautifully differentiated. SPURS asks: will these mutations destabilize the fold? BioEmu asks: does the whole chain still behave like a coherent protein ensemble, and what is the estimated thermostability? UMA asks: does a higher-fidelity physical model agree and what is the estimated thermostability? GraphKcat asks: what does the catalytic pocket geometry imply? KcatNet asks: what does the broader enzyme–substrate interaction pattern imply for turnover at scale? MMKcat asks: what do we still believe when information is missing? Together, they form not just a scoring pipeline but a hierarchy of scientific skepticism.

That is why this augmented ThermoGFN-IF is project compelling. It points toward a future in which enzyme design is not divided into disconnected stages—first stabilize, then dock, then test kinetics—but handled as a single conditional generative problem over structure, dynamics, binding, and catalysis. The generator does not hallucinate in a vacuum. It is continuously shaped by a council of oracles, each good at a different scale of biochemical truth.

Training Paradigms

UMA only GFlowNets Training

Kinetic Parameter Oracle GFlowNets Training

Binding Affinity and Thermostability GFlowNets Training Paradigms

The binding affinity and thermostability GFlowNets style RL training paradigms are similar, with Tm utilizing the addition BioEmu, and SPURS oracles, and affinity leaning on UMA more.

We also mention that we utilize RFdiffusion3 and RosettaFold3 for constructing part of the datasets, and the training paradigm is fully synthetic, with the option of training on a subset of the ReactZyme dataset later on.

We plan to validate in the wet lab and test several of the designs once the model is complete. For more info on the model see the manuscript in the codebase repo.

Tropical Quivers for Modern AI: A Guided Tour of a Research Program

March 22, 2026

Surface Orders, Cyclic Time, and a Concrete Hilbert–Pólya Framework

March 17, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote