Drone Planner β Qwen3-4B (exp-075)
A fine-tuned Qwen3-4B model for autonomous drone mission planning. Given a drone observation or pilot command, it outputs a structured JSON mission decision covering reasoning, actions, memory, and mission status.
π Performance
| Metric | Score |
|---|---|
| T1 Regression (89.3%) | 175/196 |
| T0 JSON Schema Validity | 100% (196/196) |
| Training Loss | 0.191 |
| Training Examples | 4,241 |
| Epochs | 5 |
All-time best on the drone planner regression suite as of March 2026.
Task
Given a drone sensor observation or pilot command, output a structured JSON object:
{
"reasoning": "1-2 sentence analysis of the situation",
"next_actions": ["Action1", "Action2"],
"memory_update": "What to remember for context",
"mission_status": "Continuing | Completed | Aborted | Paused | RTB | Emergency"
}
System Prompt
You are a drone mission planner. Output ONLY a JSON object. No other text.
SCHEMA:
{"reasoning": "<1-2 sentence analysis>", "next_actions": ["<action1>"], "memory_update": "<what to remember>", "mission_status": "<one of: Continuing, Completed, Aborted, Paused, RTB, Emergency>"}
RULES:
- mission_status MUST be exactly one of: Continuing, Completed, Aborted, Paused, RTB, Emergency
- Battery <20%: Emergency. Battery 20-30%: RTB. Battery >30%: normal operations
- Loss of GPS/comms: Emergency
- Hardware failure (motor warning, sensor failure): RTB or Emergency based on severity
- Mission objectives met + returning home: Completed
- Pilot says stop/abort/land/cancel: Aborted
- Pilot says hold/wait/freeze/standby: Paused
- Normal operations, mission in progress: Continuing
- Wind >35mph or severe weather: RTB or Paused
- Geofence/airspace violation: RTB immediately
- People/aircraft detected nearby: Paused or RTB based on proximity
- Property boundary noted but NOT violated: Continuing (note in memory, adjust path)
- Property boundary violated or trespassing: RTB
- Law enforcement present: Aborted
- Multiple minor issues combined: escalate to RTB (err on side of caution)
- When single marginal condition (poor visibility alone, low signal alone): Continuing unless battery also low
GROUNDING:
- Only reference waypoint numbers, altitudes, distances, and speeds that appear in the input
- If the input says "Waypoint 5/12", reference those numbers. Do NOT invent waypoint numbers
- Base your reasoning on facts stated in the observation, not assumptions
Input Format
Natural language pilot commands or structured sensor observations:
OBSERVATION: Scanning waypoint 8 of 15. No detections. Battery 74%.
PILOT: take a photo
OBSERVATION: Battery 22%. Signal dropping. GPS strong.
Output Examples
Normal scan:
{"reasoning": "At waypoint 8/15. No detections yet. Continuing scan.", "next_actions": ["Continue to next waypoint", "Maintain scan altitude"], "memory_update": "Scanning wp 8/15", "mission_status": "Continuing"}
Battery RTB:
{"reasoning": "Battery at 22% triggers RTB protocol.", "next_actions": ["Return to home"], "memory_update": "RTB: battery 22%", "mission_status": "RTB"}
Law enforcement:
{"reasoning": "Law enforcement on site requires immediate mission abort.", "next_actions": ["Land immediately", "Secure drone"], "memory_update": "Aborted: law enforcement present", "mission_status": "Aborted"}
Action Vocabulary
Key canonical actions learned by this model:
| Category | Actions |
|---|---|
| Navigation | Continue scan, Continue to next waypoint, Continue tracking, Maintain altitude, Return to home |
| Safety | Emergency land, Begin RTB immediately, Hold position |
| Camera | Take photo, Increase zoom 2x, Set gimbal pitch -45Β° |
| Tracking | Track vehicle, Log detection, Mark GPS location |
| Pilot | Climb, Move left, Move right, Bank left |
Key routing rules:
Continue scanβ no explicit waypoint number in inputContinue to next waypointβ input explicitly states "waypoint X of Y"Begin RTBβ GPS-weak compound emergenciesReturn to homeβ standard battery/wind/boundary RTB- Weak signal alone (17%) + good battery β
Continuing, not RTB
Training Details
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen3-4B |
| Method | QLoRA |
| LoRA rank | 128 |
| LoRA alpha | 256 |
| Dropout | 0.0 |
| Epochs | 5 |
| Learning rate | 4e-5 |
| Batch size | 2 |
| Gradient accumulation | 8 (effective batch 16) |
| Max sequence length | 2048 |
| Packing | Enabled |
| Training hardware | NVIDIA RTX 3090 24GB |
| Training time | ~85 min |
What Worked (Lessons Learned)
After 75+ experiments, key findings:
- Epochs=5 is the sweet spot β epochs=4 undertrained (loss ~0.21, T1 ~85%), epochs=5 hit the target (loss ~0.14β0.19, T1 ~88β89%)
- Data quality > data quantity β curated 4,241 examples outperformed earlier 4,863 examples with contradictions
- Scan/waypoint disambiguation was the biggest single fix β 181 training examples incorrectly used "Continue to next waypoint" with no waypoint number in input; correcting this recovered ~7pp
- System prompt bias kills scores β "When in doubt choose RTB" in system prompt caused ~10 extra false RTBs; removing it recovered ~3pp
- Packing essential β without packing, 4Γ more steps, same GPU time but worse convergence per step
- Dropout=0 required for Unsloth fast path β any dropout disables Unsloth's 5Γ patching speedup
- Fresh process per experiment β Unsloth's in-process loop leaks packing collator state;
--mode singlewith subprocess restart fixed it - Signal 17% + good battery = Continuing β regression confirms weak signal alone doesn't trigger RTB; only signal+battery compound does
- Begin RTB β Return to home β regression uses "Begin RTB" for GPS-weak compound emergencies, "Return to home" for standard RTB; merging them hurt scores
Path to 95% T1
Current gap: 21 failures / 196 = 10.7pp remaining
Top remaining failure clusters:
- Memory specificity (~5 failures) β "Scanning mission 66% complete" vs "Scanning scan in progress"; model loses specific progress context
- Signal 17%/bat 96% β RTB (~3 failures) β model still occasionally over-triggers RTB on weak signal alone
- Compound warning thresholds (~3 failures) β "Continue with caution" vs "Return to home" for motor warm + battery 50%
- Maintain altitude vs Continue tracking (~3 failures) β tracking context with good battery
- Property boundary nuance (~2 failures) β "noted but not violated" vs "violated" boundary handling
- Hallucination (~5 failures) β model adds "Send distress alert" or extra battery/RTB context not in input
Recommended next steps:
- Add 20β30 targeted examples for each cluster above (especially compound warning thresholds)
- For reg[193] (internally contradictory case) β decide ground truth before next run
- Consider
lora_r=64experiment β lower rank may reduce overfitting on hard cases - Try 6 epochs β loss was 0.191 at 5 epochs vs 0.140 at exp-072; may benefit from more training
- Review the 21 remaining regression failures for any data contradictions
Files
adapter_config.json+adapter_model.safetensorsβ LoRA adapter (load on top of Qwen3-4B)scorecard.jsonβ Full evaluation results with all failure casestrain_meta.jsonβ Training metadata, hyperparameters, and loss curvetokenizer.*β Tokenizer files
Part of LlamaFarm
This model is part of the LlamaFarm autonomous drone system.