Txt2Img-MHN-VQVAE

VQVAE model for remote sensing image generation, part of the Txt2Img-MHN framework.

Model Details

Class: AutoencoderKL (diffusers)
Input/Output: 3-channel RGB images (256×256)
Latent channels: 512
Parameters: ~1.6M

Usage

from diffusers import AutoencoderKL
import torch

# Load model
vae = AutoencoderKL.from_pretrained(
    "BiliSakura/Txt2Img-MHN-VQVAE",
    ignore_mismatched_sizes=True
)

# Encode image to latent
image = torch.randn(1, 3, 256, 256)
with torch.no_grad():
    latent_dist = vae.encode(image).latent_dist
    latent = latent_dist.sample()  # (1, 512, 32, 32)

# Decode latent to image
with torch.no_grad():
    decoded = vae.decode(latent).sample  # (1, 3, 256, 256)

Training

Trained on the RSICD remote sensing dataset.

Citation

@article{txt2img_mhn,
  title={Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks},
  author={Xu, Yonghao and Yu, Weikang and Ghamisi, Pedram and Kopp, Michael and Hochreiter, Sepp},
  journal={IEEE Trans. Image Process.}, 
  doi={10.1109/TIP.2023.3323799},
  year={2023}
}

License

MIT License - for academic use only.

Downloads last month: 6

Collection including BiliSakura/Txt2Img-MHN-VQVAE

Remote Sensing Visual Generative Models

Collection

diffusers implementation • 24 items • Updated Mar 8 • 1

Paper for BiliSakura/Txt2Img-MHN-VQVAE

Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks

Paper • 2208.04441 • Published Aug 8, 2022

BiliSakura
/

Txt2Img-MHN-VQVAE

Txt2Img-MHN-VQVAE

Model Details

Usage

Training

Citation

Links

License

Collection including BiliSakura/Txt2Img-MHN-VQVAE

Remote Sensing Visual Generative Models

Paper for BiliSakura/Txt2Img-MHN-VQVAE

Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks