Phenaki: Variable Length Video Generation From Open Domain Textual Description
Paper • 2210.02399 • Published • 3
The embeddings of images and video patches from raw frames x are processed by a spatial and then a causal transformer (AR in time) to gen video tokens