Physical AI Papers
List of papers we review/read for building physical AI models
Paper • 2103.15691 • Published • 4Note Speeding up transformer based video models but reducing complexity of attention. from O((T*H*W)**2) to O((H*W)**2 + T**2)
DINO-Foresight: Looking into the Future with DINO
Paper • 2412.11673 • Published • 1Note Future embedding prediction from ViT model as latent world model
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
Paper • 2601.04575 • Published • 12Note Decode all actions from same latent embedding with a transformer instead of linear / MLP
Learning Long-Context Diffusion Policies via Past-Token Prediction
Paper • 2505.09561 • PublishedNote For smoother prediction of actions, predict past and future actions from a given latent embedding
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Paper • 2412.15109 • PublishedNote Decode future frames from latent foresight token embedding. Decode actions from latent foresight
Decision Transformer: Reinforcement Learning via Sequence Modeling
Paper • 2106.01345 • Published • 3Note Control problem is a long sequence of State, Actions, Reward embedding from offline data
Offline Reinforcement Learning as One Big Sequence Modeling Problem
Paper • 2106.02039 • Published • 2Note Control problem is a long sequence of State, Actions, Reward embedding from offline data
A Generalist Agent
Paper • 2205.06175 • Published • 4Note train multimodal causal transformer from different environments (atari etc) add a policy head for control given task
-
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 157 -
A Survey on Vision-Language-Action Models for Embodied AI
Paper • 2405.14093 • Published • 1 -
π_0: A Vision-Language-Action Flow Model for General Robot Control
Paper • 2410.24164 • Published • 31 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29 -
DINOv3
Paper • 2508.10104 • Published • 305