Foundational & Modern AI Research (Curated) - a AditiShashiKhare Collection

Open-Source Foundations for Modern AI Systems

Diagnostic & Evaluation Datasets (Curated)

Foundational & Modern AI Research (Curated)

updated about 13 hours ago

A curated selection of foundational and modern AI research papers that meaningfully influence how real-world AI systems are designed, evaluated, and g

Upvote

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 10
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Paper • 2210.04186 • Published Oct 9, 2022
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 4
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published Jan 6, 2025 • 35
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18, 2024 • 37
ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 34
Toolformer: Language Models Can Teach Themselves to Use Tools

Paper • 2302.04761 • Published Feb 9, 2023 • 12
Inner Monologue: Embodied Reasoning through Planning with Language Models

Paper • 2207.05608 • Published Jul 12, 2022
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28, 2025 • 179
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published Aug 28, 2025 • 63
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21, 2025 • 47
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20, 2025 • 43
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

Paper • 2508.01780 • Published Aug 3, 2025 • 21
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments

Paper • 2510.01179 • Published Oct 1, 2025 • 28
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

Paper • 2507.12806 • Published Jul 17, 2025 • 21
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

Paper • 2509.09734 • Published Sep 10, 2025 • 16
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits

Paper • 2504.03767 • Published Apr 2, 2025 • 3
Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies

Paper • 2504.08623 • Published Apr 11, 2025 • 1
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

Paper • 2512.19432 • Published Dec 22, 2025 • 13
A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP)

Paper • 2505.02279 • Published May 4, 2025 • 1
Adaptation of Agentic AI

Paper • 2512.16301 • Published Dec 18, 2025 • 108
AWorld: Orchestrating the Training Recipe for Agentic AI

Paper • 2508.20404 • Published Aug 28, 2025 • 38
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Paper • 2504.08066 • Published Apr 10, 2025 • 22
Small Language Models are the Future of Agentic AI

Paper • 2506.02153 • Published Jun 2, 2025 • 24
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Paper • 2505.19443 • Published May 26, 2025 • 15
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 665
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge

Paper • 2505.10468 • Published May 15, 2025 • 10
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 628
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25, 2025 • 350
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308
Continued domain-specific pre-training of protein language models for pMHC-I binding prediction

Paper • 2507.13077 • Published Jul 16, 2025 • 504
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 305
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 302
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 282
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 276
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 264
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8, 2025 • 290
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 263
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 273
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 274
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 253
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 265
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 254
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 247
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 240
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 233
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 238
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3, 2025 • 225
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5, 2025 • 233
Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model

Paper • 2505.17894 • Published May 23, 2025 • 220
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 217
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 242
Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 229
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 217
CLEAR: Character Unlearning in Textual and Visual Modalities

Paper • 2410.18057 • Published Oct 23, 2024 • 209
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 216
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 211
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 213
3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 199
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 194
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 199
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published Feb 13, 2025 • 192
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 190
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 191
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 190
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193
Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 192
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published Jan 11 • 215
LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 189
OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published Apr 8, 2025 • 186
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 189
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7, 2025 • 191
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 182

Upvote

Collection guide
Browse collections