RL Job-Shop Scheduler

Reinforcement learning for job-shop scheduling: an agent learns to dispatch jobs to machines to minimize makespan (or another objective) using a Gym-style environment and stable-baselines3.

Motivation

Job-shop scheduling is NP-hard. RL can learn dispatching policies from experience without hand-crafted heuristics. This project provides a small JSP environment and trains a DQN or PPO agent as a baseline.

Environment

State: Current time, remaining operations per job, machine availability (simplified vector).
Actions: Which job to schedule next on which machine (discrete action space).
Reward: Negative makespan delta or sparse reward at episode end.
Implemented in env.py with Gym interface.

Files

env.py — Gymnasium JobShopEnv (state, actions, reward).
train.py — PPO training with stable-baselines3; saves to ./checkpoints/.
baseline_ortools.py — OR-Tools CP-SAT on a small JSP instance (separate from the RL env, for reference).

Usage

pip install -r requirements.txt
python train.py

Optional: run baseline_ortools.py to compare with an OR-Tools CP-SAT or MIP baseline on the same instances.

Model

PPO or DQN from stable-baselines3; default is PPO for stability.
Checkpoints saved in ./checkpoints/.

Limitations / future work

Small instances only; scaling to large JSP would need a different state/action representation (e.g. graph neural networks).
Optional: add more problem types (flow-shop, flexible job-shop).

Author

Alireza Aminzadeh

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support