Drone Planner — Qwen3-4B (exp-075)

A fine-tuned Qwen3-4B model for autonomous drone mission planning. Given a drone observation or pilot command, it outputs a structured JSON mission decision covering reasoning, actions, memory, and mission status.

🏆 Performance

Metric	Score
T1 Regression (89.3%)	175/196
T0 JSON Schema Validity	100% (196/196)
Training Loss	0.191
Training Examples	4,241
Epochs	5

All-time best on the drone planner regression suite as of March 2026.

Task

Given a drone sensor observation or pilot command, output a structured JSON object:

{
  "reasoning": "1-2 sentence analysis of the situation",
  "next_actions": ["Action1", "Action2"],
  "memory_update": "What to remember for context",
  "mission_status": "Continuing | Completed | Aborted | Paused | RTB | Emergency"
}

System Prompt

You are a drone mission planner. Output ONLY a JSON object. No other text.

SCHEMA:
{"reasoning": "<1-2 sentence analysis>", "next_actions": ["<action1>"], "memory_update": "<what to remember>", "mission_status": "<one of: Continuing, Completed, Aborted, Paused, RTB, Emergency>"}

RULES:
- mission_status MUST be exactly one of: Continuing, Completed, Aborted, Paused, RTB, Emergency
- Battery <20%: Emergency. Battery 20-30%: RTB. Battery >30%: normal operations
- Loss of GPS/comms: Emergency
- Hardware failure (motor warning, sensor failure): RTB or Emergency based on severity
- Mission objectives met + returning home: Completed
- Pilot says stop/abort/land/cancel: Aborted
- Pilot says hold/wait/freeze/standby: Paused
- Normal operations, mission in progress: Continuing
- Wind >35mph or severe weather: RTB or Paused
- Geofence/airspace violation: RTB immediately
- People/aircraft detected nearby: Paused or RTB based on proximity
- Property boundary noted but NOT violated: Continuing (note in memory, adjust path)
- Property boundary violated or trespassing: RTB
- Law enforcement present: Aborted
- Multiple minor issues combined: escalate to RTB (err on side of caution)
- When single marginal condition (poor visibility alone, low signal alone): Continuing unless battery also low

GROUNDING:
- Only reference waypoint numbers, altitudes, distances, and speeds that appear in the input
- If the input says "Waypoint 5/12", reference those numbers. Do NOT invent waypoint numbers
- Base your reasoning on facts stated in the observation, not assumptions

Input Format

Natural language pilot commands or structured sensor observations:

OBSERVATION: Scanning waypoint 8 of 15. No detections. Battery 74%.

PILOT: take a photo

OBSERVATION: Battery 22%. Signal dropping. GPS strong.

Output Examples

Normal scan:

{"reasoning": "At waypoint 8/15. No detections yet. Continuing scan.", "next_actions": ["Continue to next waypoint", "Maintain scan altitude"], "memory_update": "Scanning wp 8/15", "mission_status": "Continuing"}

Battery RTB:

{"reasoning": "Battery at 22% triggers RTB protocol.", "next_actions": ["Return to home"], "memory_update": "RTB: battery 22%", "mission_status": "RTB"}

Law enforcement:

{"reasoning": "Law enforcement on site requires immediate mission abort.", "next_actions": ["Land immediately", "Secure drone"], "memory_update": "Aborted: law enforcement present", "mission_status": "Aborted"}

Action Vocabulary

Key canonical actions learned by this model:

Category	Actions
Navigation	`Continue scan`, `Continue to next waypoint`, `Continue tracking`, `Maintain altitude`, `Return to home`
Safety	`Emergency land`, `Begin RTB immediately`, `Hold position`
Camera	`Take photo`, `Increase zoom 2x`, `Set gimbal pitch -45°`
Tracking	`Track vehicle`, `Log detection`, `Mark GPS location`
Pilot	`Climb`, `Move left`, `Move right`, `Bank left`

Key routing rules:

Continue scan — no explicit waypoint number in input
Continue to next waypoint — input explicitly states "waypoint X of Y"
Begin RTB — GPS-weak compound emergencies
Return to home — standard battery/wind/boundary RTB
Weak signal alone (17%) + good battery → Continuing, not RTB

Training Details

Parameter	Value
Base model	`unsloth/Qwen3-4B`
Method	QLoRA
LoRA rank	128
LoRA alpha	256
Dropout	0.0
Epochs	5
Learning rate	4e-5
Batch size	2
Gradient accumulation	8 (effective batch 16)
Max sequence length	2048
Packing	Enabled
Training hardware	NVIDIA RTX 3090 24GB
Training time	~85 min

What Worked (Lessons Learned)

After 75+ experiments, key findings:

Epochs=5 is the sweet spot — epochs=4 undertrained (loss ~0.21, T1 ~85%), epochs=5 hit the target (loss ~0.14–0.19, T1 ~88–89%)
Data quality > data quantity — curated 4,241 examples outperformed earlier 4,863 examples with contradictions
Scan/waypoint disambiguation was the biggest single fix — 181 training examples incorrectly used "Continue to next waypoint" with no waypoint number in input; correcting this recovered ~7pp
System prompt bias kills scores — "When in doubt choose RTB" in system prompt caused ~10 extra false RTBs; removing it recovered ~3pp
Packing essential — without packing, 4× more steps, same GPU time but worse convergence per step
Dropout=0 required for Unsloth fast path — any dropout disables Unsloth's 5× patching speedup
Fresh process per experiment — Unsloth's in-process loop leaks packing collator state; --mode single with subprocess restart fixed it
Signal 17% + good battery = Continuing — regression confirms weak signal alone doesn't trigger RTB; only signal+battery compound does
Begin RTB ≠ Return to home — regression uses "Begin RTB" for GPS-weak compound emergencies, "Return to home" for standard RTB; merging them hurt scores

Path to 95% T1

Current gap: 21 failures / 196 = 10.7pp remaining

Top remaining failure clusters:

Memory specificity (~5 failures) — "Scanning mission 66% complete" vs "Scanning scan in progress"; model loses specific progress context
Signal 17%/bat 96% → RTB (~3 failures) — model still occasionally over-triggers RTB on weak signal alone
Compound warning thresholds (~3 failures) — "Continue with caution" vs "Return to home" for motor warm + battery 50%
Maintain altitude vs Continue tracking (~3 failures) — tracking context with good battery
Property boundary nuance (~2 failures) — "noted but not violated" vs "violated" boundary handling
Hallucination (~5 failures) — model adds "Send distress alert" or extra battery/RTB context not in input

Recommended next steps:

Add 20–30 targeted examples for each cluster above (especially compound warning thresholds)
For reg[193] (internally contradictory case) — decide ground truth before next run
Consider lora_r=64 experiment — lower rank may reduce overfitting on hard cases
Try 6 epochs — loss was 0.191 at 5 epochs vs 0.140 at exp-072; may benefit from more training
Review the 21 remaining regression failures for any data contradictions

Files

adapter_config.json + adapter_model.safetensors — LoRA adapter (load on top of Qwen3-4B)
scorecard.json — Full evaluation results with all failure cases
train_meta.json — Training metadata, hyperparameters, and loss curve
tokenizer.* — Tokenizer files

Part of LlamaFarm

This model is part of the LlamaFarm autonomous drone system.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for llama-farm/drone-planner-qwen3-4b

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

unsloth/Qwen3-4B

Finetuned

(599)

this model