Drone Planner β€” Qwen3-4B (exp-075)

A fine-tuned Qwen3-4B model for autonomous drone mission planning. Given a drone observation or pilot command, it outputs a structured JSON mission decision covering reasoning, actions, memory, and mission status.

πŸ† Performance

Metric Score
T1 Regression (89.3%) 175/196
T0 JSON Schema Validity 100% (196/196)
Training Loss 0.191
Training Examples 4,241
Epochs 5

All-time best on the drone planner regression suite as of March 2026.

Task

Given a drone sensor observation or pilot command, output a structured JSON object:

{
  "reasoning": "1-2 sentence analysis of the situation",
  "next_actions": ["Action1", "Action2"],
  "memory_update": "What to remember for context",
  "mission_status": "Continuing | Completed | Aborted | Paused | RTB | Emergency"
}

System Prompt

You are a drone mission planner. Output ONLY a JSON object. No other text.

SCHEMA:
{"reasoning": "<1-2 sentence analysis>", "next_actions": ["<action1>"], "memory_update": "<what to remember>", "mission_status": "<one of: Continuing, Completed, Aborted, Paused, RTB, Emergency>"}

RULES:
- mission_status MUST be exactly one of: Continuing, Completed, Aborted, Paused, RTB, Emergency
- Battery <20%: Emergency. Battery 20-30%: RTB. Battery >30%: normal operations
- Loss of GPS/comms: Emergency
- Hardware failure (motor warning, sensor failure): RTB or Emergency based on severity
- Mission objectives met + returning home: Completed
- Pilot says stop/abort/land/cancel: Aborted
- Pilot says hold/wait/freeze/standby: Paused
- Normal operations, mission in progress: Continuing
- Wind >35mph or severe weather: RTB or Paused
- Geofence/airspace violation: RTB immediately
- People/aircraft detected nearby: Paused or RTB based on proximity
- Property boundary noted but NOT violated: Continuing (note in memory, adjust path)
- Property boundary violated or trespassing: RTB
- Law enforcement present: Aborted
- Multiple minor issues combined: escalate to RTB (err on side of caution)
- When single marginal condition (poor visibility alone, low signal alone): Continuing unless battery also low

GROUNDING:
- Only reference waypoint numbers, altitudes, distances, and speeds that appear in the input
- If the input says "Waypoint 5/12", reference those numbers. Do NOT invent waypoint numbers
- Base your reasoning on facts stated in the observation, not assumptions

Input Format

Natural language pilot commands or structured sensor observations:

OBSERVATION: Scanning waypoint 8 of 15. No detections. Battery 74%.
PILOT: take a photo
OBSERVATION: Battery 22%. Signal dropping. GPS strong.

Output Examples

Normal scan:

{"reasoning": "At waypoint 8/15. No detections yet. Continuing scan.", "next_actions": ["Continue to next waypoint", "Maintain scan altitude"], "memory_update": "Scanning wp 8/15", "mission_status": "Continuing"}

Battery RTB:

{"reasoning": "Battery at 22% triggers RTB protocol.", "next_actions": ["Return to home"], "memory_update": "RTB: battery 22%", "mission_status": "RTB"}

Law enforcement:

{"reasoning": "Law enforcement on site requires immediate mission abort.", "next_actions": ["Land immediately", "Secure drone"], "memory_update": "Aborted: law enforcement present", "mission_status": "Aborted"}

Action Vocabulary

Key canonical actions learned by this model:

Category Actions
Navigation Continue scan, Continue to next waypoint, Continue tracking, Maintain altitude, Return to home
Safety Emergency land, Begin RTB immediately, Hold position
Camera Take photo, Increase zoom 2x, Set gimbal pitch -45Β°
Tracking Track vehicle, Log detection, Mark GPS location
Pilot Climb, Move left, Move right, Bank left

Key routing rules:

  • Continue scan β€” no explicit waypoint number in input
  • Continue to next waypoint β€” input explicitly states "waypoint X of Y"
  • Begin RTB β€” GPS-weak compound emergencies
  • Return to home β€” standard battery/wind/boundary RTB
  • Weak signal alone (17%) + good battery β†’ Continuing, not RTB

Training Details

Parameter Value
Base model unsloth/Qwen3-4B
Method QLoRA
LoRA rank 128
LoRA alpha 256
Dropout 0.0
Epochs 5
Learning rate 4e-5
Batch size 2
Gradient accumulation 8 (effective batch 16)
Max sequence length 2048
Packing Enabled
Training hardware NVIDIA RTX 3090 24GB
Training time ~85 min

What Worked (Lessons Learned)

After 75+ experiments, key findings:

  1. Epochs=5 is the sweet spot β€” epochs=4 undertrained (loss ~0.21, T1 ~85%), epochs=5 hit the target (loss ~0.14–0.19, T1 ~88–89%)
  2. Data quality > data quantity β€” curated 4,241 examples outperformed earlier 4,863 examples with contradictions
  3. Scan/waypoint disambiguation was the biggest single fix β€” 181 training examples incorrectly used "Continue to next waypoint" with no waypoint number in input; correcting this recovered ~7pp
  4. System prompt bias kills scores β€” "When in doubt choose RTB" in system prompt caused ~10 extra false RTBs; removing it recovered ~3pp
  5. Packing essential β€” without packing, 4Γ— more steps, same GPU time but worse convergence per step
  6. Dropout=0 required for Unsloth fast path β€” any dropout disables Unsloth's 5Γ— patching speedup
  7. Fresh process per experiment β€” Unsloth's in-process loop leaks packing collator state; --mode single with subprocess restart fixed it
  8. Signal 17% + good battery = Continuing β€” regression confirms weak signal alone doesn't trigger RTB; only signal+battery compound does
  9. Begin RTB β‰  Return to home β€” regression uses "Begin RTB" for GPS-weak compound emergencies, "Return to home" for standard RTB; merging them hurt scores

Path to 95% T1

Current gap: 21 failures / 196 = 10.7pp remaining

Top remaining failure clusters:

  1. Memory specificity (~5 failures) β€” "Scanning mission 66% complete" vs "Scanning scan in progress"; model loses specific progress context
  2. Signal 17%/bat 96% β†’ RTB (~3 failures) β€” model still occasionally over-triggers RTB on weak signal alone
  3. Compound warning thresholds (~3 failures) β€” "Continue with caution" vs "Return to home" for motor warm + battery 50%
  4. Maintain altitude vs Continue tracking (~3 failures) β€” tracking context with good battery
  5. Property boundary nuance (~2 failures) β€” "noted but not violated" vs "violated" boundary handling
  6. Hallucination (~5 failures) β€” model adds "Send distress alert" or extra battery/RTB context not in input

Recommended next steps:

  1. Add 20–30 targeted examples for each cluster above (especially compound warning thresholds)
  2. For reg[193] (internally contradictory case) β€” decide ground truth before next run
  3. Consider lora_r=64 experiment β€” lower rank may reduce overfitting on hard cases
  4. Try 6 epochs β€” loss was 0.191 at 5 epochs vs 0.140 at exp-072; may benefit from more training
  5. Review the 21 remaining regression failures for any data contradictions

Files

  • adapter_config.json + adapter_model.safetensors β€” LoRA adapter (load on top of Qwen3-4B)
  • scorecard.json β€” Full evaluation results with all failure cases
  • train_meta.json β€” Training metadata, hyperparameters, and loss curve
  • tokenizer.* β€” Tokenizer files

Part of LlamaFarm

This model is part of the LlamaFarm autonomous drone system.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for llama-farm/drone-planner-qwen3-4b

Finetuned
Qwen/Qwen3-4B
Finetuned
unsloth/Qwen3-4B
Finetuned
(599)
this model