LFM2.5-VL-450M β€” Satellite Image Triage

Fine-tuned LFM2.5-VL-450M for on-board satellite image triage. Given a satellite image, the model outputs a JSON object with a description, priority level, reasoning, and categories β€” enabling autonomous downlink decisions.

Part of the automatic-downlink project for the Liquid AI x DPhi Space "AI in Space" Hackathon.

Usage

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
import torch

model = AutoModelForImageTextToText.from_pretrained(
    "marcelo-earth/LFM2.5-VL-450M-satellite-triage",
    dtype=torch.float16,
)
# Processor must be loaded from base model
processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-450M")

image = Image.open("satellite_image.png").convert("RGB")

conversation = [
    {"role": "system", "content": [{"type": "text", "text": "You are a satellite image triage system. Analyze the image and respond ONLY with a JSON object."}]},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Triage this satellite image. Respond with JSON only."},
    ]},
]

inputs = processor.apply_chat_template(
    conversation, add_generation_prompt=True,
    return_tensors="pt", return_dict=True, tokenize=True,
)
output_ids = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)
generated = output_ids[0, inputs["input_ids"].shape[1]:]
print(processor.decode(generated, skip_special_tokens=True))

Output:

{"description": "Densely populated urban area with residential and commercial buildings and a small harbor", "priority": "MEDIUM", "reasoning": "Routine scene with identifiable features β€” standard downlink", "categories": ["urban", "infrastructure"]}

Training

  • Base model: LiquidAI/LFM2.5-VL-450M
  • Method: LoRA SFT (rank 16, alpha 32) via leap-finetune
  • Data: 20,264 caption-only samples from VRSBench, converted to triage JSON format with heuristic priority labels
  • Training: 2 epochs on Modal H100, lr=1e-4, batch size 4, vision encoder lr multiplier 0.1
  • Eval loss: 0.83 (epoch 2)

Data Pipeline

VRSBench contains 142K items across 3 task types. Only the 20,264 [caption] items were used β€” VQA and referring tasks were filtered out. Captions were converted to triage JSON with keyword-based priority assignment:

Priority Count Example Keywords
MEDIUM 16,896 urban, building, road, vehicle
LOW 1,945 desert, barren, sparse
SKIP 1,277 cloud, ocean, haze
HIGH 142 deforestation, construction
CRITICAL 4 fire, flood

Evaluation vs Base Model

Tested on 4 Sentinel-2 satellite images:

Metric Base Model Fine-tuned
Valid JSON 66% 100%
Correct priority 33% 100%
Unique descriptions 33% 100%
Correct schema 33% 100%

The base model copies few-shot examples verbatim and wraps output in markdown fences. The fine-tuned model produces image-specific descriptions with correct triage JSON schema.

Limitations

  • Trained on VRSBench (Google Earth imagery), not Sentinel-2 directly β€” some domain gap expected
  • Priority labels are heuristic, not human-annotated
  • Only 2 of 3 planned epochs completed (eval loss was still decreasing)
  • 450M parameters limits description quality compared to larger VLMs
Downloads last month
79
Safetensors
Model size
0.5B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for marcelo-earth/LFM2.5-VL-450M-satellite-triage

Adapter
(2)
this model

Dataset used to train marcelo-earth/LFM2.5-VL-450M-satellite-triage