Ajitg25 commited on
Commit
db91dc6
Β·
verified Β·
1 Parent(s): 342a725

Add Blog.md writeup

Browse files
Files changed (1) hide show
  1. Blog.md +171 -0
Blog.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Can an LLM Learn to Save Lives by Managing City Traffic?
2
+
3
+ **tl;dr:** We built an OpenEnv environment that trains an LLM to act as emergency dispatcher + city traffic signal manager. After GRPO training, signal efficiency jumped from **11% β†’ 100%**. Here's how it works and why it genuinely needs an LLM β€” not just a rule.
4
+
5
+ ---
6
+
7
+ ## The Problem
8
+
9
+ In a cardiac emergency, every minute of delay costs ~10% survival probability.
10
+
11
+ Existing GPS-based emergency preemption systems (like Opticom) clear one traffic signal when an ambulance is 300m away. That's reactive, single-intersection, and has no awareness of what lies ahead.
12
+
13
+ Our environment asks: **can an LLM reason about the full journey β€” hospital selection, road quality, live traffic, and dynamic events β€” to get the ambulance there faster?**
14
+
15
+ ---
16
+
17
+ ## Why This Needs an LLM (Not a Rule)
18
+
19
+ Consider this scenario:
20
+
21
+ - **Hospital A:** 6 intersections away, but 3 road segments are gridlocked. Clearing signals helps, but heavy traffic means the ambulance crawls at ~20% speed even on green.
22
+ - **Hospital B:** 8 intersections, lighter traffic, highway-quality roads. ETA is actually 40 seconds faster.
23
+ - **Midway:** an accident blocks the planned route. The system must re-route in real time.
24
+
25
+ No rule-based system can solve this. The agent must simultaneously reason about:
26
+
27
+ - Distance vs. traffic volume vs. road quality
28
+ - Hospital specialization (cardiac patient β†’ cardiac centre, not general hospital)
29
+ - Dynamic events appearing mid-journey (accidents, road closures, traffic spikes)
30
+ - Which signals actually need clearing β€” toggling an already-green signal wastes an action and costs reward
31
+
32
+ ---
33
+
34
+ ## The Environment
35
+
36
+ Built on **[OpenEnv](https://github.com/meta-pytorch/OpenEnv)** β€” the hackathon framework for LLM training environments.
37
+
38
+ ### What the agent sees each step
39
+
40
+ ```
41
+ === EMERGENCY DISPATCH ===
42
+ Patient : (6, 3) | condition: cardiac
43
+ Ambulance: (6, 4) | time: 40s / 300s
44
+
45
+ ⚠ DYNAMIC EVENTS:
46
+ [ACCIDENT] at (4,3) β€” blocking road (severity=0.8)
47
+
48
+ CURRENT ROUTE β†’ hosp_a
49
+ ETA=251s | segments=8 | damaged=2 | heavy_traffic=1
50
+ (6,4)β†’(5,4) | residential | quality=moderate | traffic=45% | est=22s
51
+ (5,4)β†’(4,4) | damaged | quality=POTHOLED | traffic=62% | est=41s [BLOCKED]
52
+
53
+ ALTERNATIVES (consider switching if ETA much lower):
54
+ hosp_c (cardiac) <- specialist match: ETA=130s | damaged=0 | heavy=0
55
+
56
+ HOSPITALS:
57
+ hosp_a: City General | spec=general | est=251s
58
+ hosp_c: Cardiac Centre | spec=cardiac | est=130s <- specialist match
59
+
60
+ SIGNALS (only change WRONG ones):
61
+ (5,4): ns_green | dir=north | OK
62
+ (4,4): ew_green | dir=north | WRONG β€” needs ns_green
63
+
64
+ ACTION: {"hospital_id": "hosp_c", "signal_controls": [{"row": 4, "col": 4, "phase": "ns_green"}], "preferred_direction": null}
65
+ ```
66
+
67
+ ### Reward function β€” designed to be hard to game
68
+
69
+ | Component | Value |
70
+ |---|---|
71
+ | Arrival bonus | +1000 |
72
+ | Time bonus | up to +500 (faster = more) |
73
+ | Specialist hospital match | +300 |
74
+ | Red light stop | βˆ’20 each |
75
+ | **Unnecessary signal toggle** | **βˆ’2/βˆ’5 each** |
76
+ | Damaged road segments traversed | βˆ’10 each |
77
+ | Successful re-route | +50 each |
78
+
79
+ The unnecessary toggle penalty is the key design decision. An agent that blindly clears every signal it sees scores *worse* than one that reads the signal state first. This forces the LLM to actually reason about observations rather than pattern-match to a fixed action.
80
+
81
+ ### Difficulty levels
82
+
83
+ | Level | Grid | Hospitals | Traffic | Dynamic Events | Time Limit |
84
+ |---|---|---|---|---|---|
85
+ | easy | 6Γ—6 | 2 | Low | 5%/step | 200s |
86
+ | medium | 8Γ—8 | 3 | Moderate | 10%/step | 300s |
87
+ | hard | 12Γ—12 | 5 (1 at capacity) | Heavy | 15%/step | 400s |
88
+
89
+ ---
90
+
91
+ ## Training
92
+
93
+ - **Model:** `Qwen/Qwen2.5-0.5B-Instruct` + LoRA (r=16, 2.1M trainable params)
94
+ - **Algorithm:** GRPO (Group Relative Policy Optimisation via HuggingFace TRL)
95
+ - **Setup:** 10 iterations Γ— 4 episodes per iteration
96
+ - **Environment:** Live OpenEnv server running alongside training loop
97
+
98
+ ![Training curves](ambulance_training_results.png)
99
+ *Four panels: Episode reward, Hospital arrival rate, Signal efficiency (11%β†’100%), Adaptive re-routing*
100
+
101
+ ### Results
102
+
103
+ | Metric | Baseline (untrained) | Trained | Change |
104
+ |---|---|---|---|
105
+ | Arrival rate | 100% | 100% | β€” |
106
+ | **Signal efficiency** | **11%** | **100%** | **+89 percentage points** |
107
+ | Mean reward | 1442.6 | 1445.3 | +2.7 |
108
+ | Mean travel time | 125s | 127.5s | β€” |
109
+
110
+ ### What the numbers mean
111
+
112
+ **Signal efficiency is the headline metric.** The untrained model toggled every signal it saw β€” including ones already in the correct phase β€” scoring unnecessary toggle penalties on every step. After GRPO training, the model learned to read `sig.phase` vs `sig.ambulance_direction` and only act when a signal genuinely needs changing.
113
+
114
+ The training curve shows characteristic GRPO behaviour:
115
+ - **Iterations 1:** model arrives but wastes actions (efficiency=11%)
116
+ - **Iterations 2–4:** exploration phase β€” model tries aggressive strategies, arrival drops to 0–25%
117
+ - **Iterations 5–10:** sharp convergence β€” 100% arrival, 100% signal efficiency, stable reward
118
+
119
+ This exploration→convergence pattern is the training story. A rule-based system would never show this curve — it would be flat from iteration 1.
120
+
121
+ ---
122
+
123
+ ## Why This Environment Matters
124
+
125
+ Emergency vehicle routing is a real, unsolved problem in smart city infrastructure. Current systems are:
126
+
127
+ - **Reactive:** clear one signal at a time, 300m in advance
128
+ - **Unaware of road quality:** a potholed road still gets treated as highway
129
+ - **Static:** no dynamic re-routing when accidents occur
130
+ - **Oblivious to hospital specialization:** nearest hospital isn't always right hospital
131
+
132
+ An LLM trained on this environment learns to reason about all four simultaneously. That's a capability that doesn't exist in any deployed system today.
133
+
134
+ Could a researcher write a paper about this? Yes β€” "LLM-based adaptive emergency corridor planning under partial observability and dynamic constraints" is a legitimate research direction this environment enables.
135
+
136
+ ---
137
+
138
+ ## Try It
139
+
140
+ **Live environment:** https://huggingface.co/spaces/Ajitg25/ambulance-green-corridor
141
+
142
+ **Code + training notebook:** https://github.com/ajitg25/openEnv-hackathon/tree/final
143
+
144
+ ```python
145
+ from ambulance_env import AmbulanceEnv, AmbulanceAction, SignalControl
146
+
147
+ async with AmbulanceEnv(base_url="https://ajitg25-ambulance-green-corridor.hf.space") as env:
148
+ obs = (await env.reset()).observation
149
+ # Dispatch to specialist hospital
150
+ obs = (await env.step(AmbulanceAction(hospital_id="hosp_b"))).observation
151
+ # Clear only wrong-phase signals
152
+ controls = [
153
+ SignalControl(row=s.row, col=s.col,
154
+ phase="ns_green" if s.ambulance_direction in ("north","south") else "ew_green")
155
+ for s in obs.lookahead_signals
156
+ if s.phase != ("ns_green" if s.ambulance_direction in ("north","south") else "ew_green")
157
+ ]
158
+ result = await env.step(AmbulanceAction(signal_controls=controls))
159
+ ```
160
+
161
+ ---
162
+
163
+ ## What's Next
164
+
165
+ - Scale training to **medium/hard difficulty** (12Γ—12 grid, accidents, road closures, potholed roads)
166
+ - **Multi-agent:** two ambulances sharing one signal controller β€” cooperation required
167
+ - **Visualization UI:** city grid showing signal states, ambulance position, and re-routing decisions in real time
168
+
169
+ ---
170
+
171
+ *Built for the OpenEnv Hackathon India 2026.*