snehalstomar
/

GriDiT

Model card Files Files and versions

GriDiT / README.md

snehalstomar's picture

Update README.md

2cb2109 verified about 2 months ago

|

history blame contribute delete

2.54 kB

	---
	license: mit
	datasets:
	- maxin-cn/SkyTimelapse
	- ltzheng/minecraft
	language:
	- en
	base_model:
	- facebook/DiT-XL-2-512
	- facebook/DiT-XL-2-256
	---
	<p align="center">
	<h2 align="center"> GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation </h2>
	<p align="center">
	<a href="https://snehalstomar.github.io/">Snehal Singh Tomar</a>
	.
	<a href="https://alexgraikos.github.io/">Alexandros Graikos</a>
	.
	<a href="https://www.linkedin.com/in/arjun-krishna-a3573710/">A. Krishna</a>
	.
	<a href="https://www3.cs.stonybrook.edu/~samaras/">Dimitris Samaras</a>
	.
	<a href="https://www3.cs.stonybrook.edu/~mueller/">Klaus Mueller</a>
	</p>
	<p align="center"> <strong>Transactions on Machine Learning Research (TMLR) 2026</strong></p>
	<p align="center">
	Stony Brook University
	</p>
	<h3 align="center">

	[![arXiv](https://img.shields.io/badge/arXiv-blue?logo=arxiv&color=%23B31B1B)](https://arxiv.org/abs/2512.21276)
	[![ProjectPage](https://img.shields.io/badge/Project_Page-GriDiT-blue)]()
	[![GitHub-GriDiT](https://img.shields.io/badge/GitHub-181717?logo=github&logoColor=ffffff)](https://github.com/snehalstomar/GriDiT)

	<div align="center"></div>
	</p>

	<p align="center">
	<a href="">
	<img src="teaser.png" width="100%">
	</a>
	</p>

	<h5 align="left">
	<em>TL;DR:</em> State-of-the-Art image sequence generation models treat image sequences as large tensors of ordered frames.
	In contrast, our method factorizes image sequence generation into two stages. First, we learn to model
	the dynamics of the sequence at low resolution, treating the frames as subsampled image grids. Second, we
	learn to super-resolve individual frames at high resolution. Using the DiT’s self-attention mechanism to model
	dynamics across frames, and paired with our sampling strategy, our method yields superior synthesis quality
	for sequences of arbitrary length while significantly reducing sampling time and training data requirements.
	</h5>

	## Code and Execution Details

	Please visit our [Github repository](https://github.com/snehalstomar/GriDiT).

	## Citation

	Please cite our work as:

	```
	@article{
	tomar2026gridit,
	title={GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation},
	author={Snehal Singh Tomar and Alexandros Graikos and Arjun Krishna and Dimitris Samaras and Klaus Mueller},
	journal={Transactions on Machine Learning Research},
	issn={2835-8856},
	year={2026},
	url={https://openreview.net/forum?id=QLD47Ou5lp},
	note={}
	}
	```