Spatia: Video Generation with Updatable Spatial Memory

Long-horizon, spatially consistent video generation enabled by persistent 3D scene point clouds and dynamic-static disentanglement.

Jinjing Zhao^*1 Fangyun Wei^*2 Zhening Liu³ Hongyang Zhang⁴ Chang Xu¹ Yan Lu²

¹The University of Sydney ²Microsoft Research ³HKUST ⁴University of Waterloo
^*Equal Contribution

📖 Abstract

Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory.

Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities.

Furthermore, Spatia enables applications such as:

Explicit Camera Control
3D-Aware Interactive Editing
Long-horizon Scene Exploration

Citation

If you find this project useful, please cite the paper.

@inproceedings{zhao2026spatia,
  title={Spatia: Video Generation with Updatable Spatial Memory},
  author={Zhao, Jinjing and Wei, Fangyun and Liu, Zhening and Zhang, Hongyang and Xu, Chang and Lu, Yan},
  booktitle={Proceedings of the IEEE/cvf conference on computer vision and pattern recognition},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Jinjing713/Spatia

Base model

Wan-AI/Wan2.2-TI2V-5B

Finetuned

(22)

this model

Paper for Jinjing713/Spatia

Spatia: Video Generation with Updatable Spatial Memory

Paper • 2512.15716 • Published Dec 17, 2025 • 34