Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 13 days ago • 118
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published May 4 • 348
WildDet3D Collection This is the collection of WildDet3D artifacts, including demos, model checkpoints and data. https://github.com/allenai/WildDet3D • 8 items • Updated Apr 13 • 20
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Paper • 2603.24575 • Published Mar 25 • 19
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos Paper • 2602.23543 • Published Feb 26 • 9
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics Paper • 2602.19313 • Published Feb 22 • 26
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 15 items • Updated Mar 2 • 40
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20, 2025 • 64
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index Paper • 2506.12229 • Published Jun 13, 2025 • 3
DocRAG Datasets Collection Processed ("Unified") datasets used in DocRAG for training or inference purposes. • 12 items • Updated Jun 14, 2025 • 1
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Paper • 2504.15280 • Published Apr 21, 2025 • 25
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated Dec 23, 2025 • 310
Synthetic Object Compositions for Det / Seg / Grounding Collection Dataset Collections for paper: https://github.com/weikaih04/Synthetic-Detection-Segmentation-Grounding-Data • 8 items • Updated Mar 2 • 2