UAV-Self-Positioning-23M-ZCN
Task: Image feature extraction / UAV self-positioning in low-altitude urban environments
Base model: timm/vit_small_patch16_224.augreg_in1k
Backbone: ViT-S
Library: timm, PyTorch
Dataset: Dmmm997/DenseUAV (DenseUAV)
Training Performance: Weights & Biases
License: Apache-2.0
Internship organization: Institute of Mathematical and Computational Sciences (IMACS), Ho Chi Minh City University of Technology, Vietnam
Supervisor: M.Sc. NGUYEN VAN GIA THINH
Model description
UAV-Self-Positioning-23M-ZCN is an image feature extraction model finetuned from the backbone timm/vit_small_patch16_224.augreg_in1k for vision-based UAV self-positioning in low-altitude urban environments.
This model is trained from the official DenseUAV baseline with a ViT-S backbone,
using the code and training pipeline from the repository Dmmm1997/DenseUAV and the DenseUAV dataset (Dmmm997/DenseUAV). All credits for the dataset, baseline architecture, and evaluation protocol belong to the authors of DenseUAV. In this work I:
- configure
baseline/opts.yamlfor this experiment, - train the model using the original baseline training scripts,
- and export the checkpoint as a model on Hugging Face.
The training performance logs are reported using Weights & Biases; check them at DenseUAV Non-GPS Training Performance
If you use this model in research or applications, please cite the DenseUAV paper and repository (see Citation below).
Origin and citation (important)
This model is fully based on:
- The original DenseUAV repository:
Dmmm1997/DenseUAV - The paper: Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments (IEEE Transactions on Image Processing, 2024), arxiv.org/abs/2201.09201
If you use this model, please cite:
@misc{dai2023visionbaseduavselfpositioninglowaltitude,
title={Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments},
author={Ming Dai and Enhui Zheng and Zhenhua Feng and Jiedong Zhuang and Wankou Yang},
year={2023},
eprint={2201.09201},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2201.09201},
}
And the repository:
Dmmm1997. DenseUAV: Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments. GitHub, 2023.
https://github.com/Dmmm1997/DenseUAV
I am not the author of DenseUAV; I only retrain and publish a checkpoint under the same Apache-2.0 license.
Quick usage
The model is published as an image feature extractor on Hugging Face, similar to the models under the Image Feature Extraction task on huggingface.co/models.
Example usage with timm (PyTorch):
- The uploaded checkpoint was saved from a wrapper model, so ViT weights are stored under the prefix
backbone.backbone.. - The snippet below strips that prefix and loads the backbone successfully (prints
torch.Size([1, 384])).
import os
import torch
from PIL import Image
from timm import create_model
from torchvision import transforms
device = "cuda" if torch.cuda.is_available() else "cpu"
# 1. Create ViT backbone
model = create_model(
"vit_small_patch16_224",
pretrained=False,
num_classes=0, # feature extractor
).to(device)
# 2. Download checkpoint from Hugging Face
state_dict = torch.hub.load_state_dict_from_url(
"https://huggingface.co/Bancie/UAV-Self-Positioning-23M-ZCN/resolve/main/UAV_SelfPositioning_23M_ZCN.pth",
map_location=device,
)
# 3. Strip wrapper prefix and load ViT weights
vit_prefix = "backbone.backbone."
vit_state_dict = {k[len(vit_prefix):]: v for k, v in state_dict.items() if k.startswith(vit_prefix)}
model.load_state_dict(vit_state_dict, strict=False)
model.eval()
# 4. Preprocess a local UAV-view image (224x224)
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])
image_path = "/path/to/your/drone_image.png"
assert os.path.exists(image_path), f"Image not found: {image_path}"
image = Image.open(image_path).convert("RGB")
x = transform(image).unsqueeze(0).to(device)
# 5. Extract feature vector
with torch.no_grad():
feat = model(x) # [1, D]
feat = torch.nn.functional.normalize(feat, dim=-1)
print(feat.shape) # torch.Size([1, 384])
In the DenseUAV setting, these features are used to:
- build embeddings for UAV-view and satellite-view images,
- compute distances (e.g., cosine or Euclidean),
- and evaluate Recall and SDM as in the original DenseUAV code.
Training data
- Dataset: DenseUAV (
Dmmm997/DenseUAV) - Data type: UAV-view and satellite-view images captured in low-altitude urban environments
- Split: train / query / gallery as described in the original DenseUAV README
- Preprocessing: follows the baseline pipeline from the DenseUAV repository (resize, augmentation, normalization)
To fully reproduce training and evaluation, please clone the original repository and follow its instructions.
Training procedure
- Code: directly based on the baseline implementation under
baseline/inDmmm1997/DenseUAV. - Backbone:
vit_small_patch16_224.augreg_in1kfromtimm. - Config: a customized
baseline/opts.yamlfor this experiment. - Scripts: trained using the original baseline scripts (for example
train.py/train_test_local.sh) as suggested by the DenseUAV authors.
The architecture and loss functions follow the baseline; only minor hyperparameter choices and random seed are adapted to my compute resources.
Intended uses and limitations
Intended uses
- Research on UAV self-positioning and cross-view geo-localization.
- Backbone / feature extractor for new methods (e.g., variants or extensions built on top of DenseUAV).
- Quick experiments on DenseUAV without training from scratch.
Not intended for
- Out-of-domain UAV positioning (very different cities, weather conditions, altitudes, or sensors).
- Safety-critical applications without proper validation, calibration, and monitoring.
Limitations
- Trained on a specific dataset; may not generalize to all UAV scenarios.
- Processes single images; temporal information from video sequences is not explicitly modeled.
Evaluation
The evaluation protocol follows the original DenseUAV repository (Recall@K, SDM, etc.).
To re-evaluate this checkpoint on DenseUAV:
- Clone the DenseUAV repository and prepare the dataset as described there.
- Place this checkpoint under
checkpoints/<name>/. - Run:
python test.py --name <name> --test_dir <dataset_root>/test
python evaluate_gpu.py
python evaluateDistance.py --root_dir <dataset_root>
Replace <name> with your checkpoint directory name and <dataset_root> with the path to the DenseUAV dataset. For details, please refer to the original DenseUAV README.
License
- Model checkpoint: released under Apache-2.0, the same license as the original DenseUAV project.
- Original code and dataset: owned by the DenseUAV authors – see the
LICENSEfile inDmmm1997/DenseUAV.
By using this model, you agree to comply with the Apache-2.0 license of both DenseUAV and Hugging Face.
- Downloads last month
- -
Model tree for Bancie/UAV-Self-Positioning-23M-ZCN
Base model
timm/vit_small_patch16_224.augreg_in1k