Remote Sensing Visual Generative Models
Collection
diffusers implementation • 24 items • Updated • 1
VQVAE model for remote sensing image generation, part of the Txt2Img-MHN framework.
AutoencoderKL (diffusers)from diffusers import AutoencoderKL
import torch
# Load model
vae = AutoencoderKL.from_pretrained(
"BiliSakura/Txt2Img-MHN-VQVAE",
ignore_mismatched_sizes=True
)
# Encode image to latent
image = torch.randn(1, 3, 256, 256)
with torch.no_grad():
latent_dist = vae.encode(image).latent_dist
latent = latent_dist.sample() # (1, 512, 32, 32)
# Decode latent to image
with torch.no_grad():
decoded = vae.decode(latent).sample # (1, 3, 256, 256)
Trained on the RSICD remote sensing dataset.
@article{txt2img_mhn,
title={Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks},
author={Xu, Yonghao and Yu, Weikang and Ghamisi, Pedram and Kopp, Michael and Hochreiter, Sepp},
journal={IEEE Trans. Image Process.},
doi={10.1109/TIP.2023.3323799},
year={2023}
}
MIT License - for academic use only.