✕ Clear all filters
20,329 articles

📰 ArXiv cs.AI

20,329 articles · Updated every 3 hours · View all reads

All Articles 81,531Blog Posts 105,252Tech Tutorials 19,840Research Papers 17,829News 13,870 ⚡ AI Lessons
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation
arXiv:2606.12616v1 Announce Type: new Abstract: Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave lar
ArXiv cs.AI 📄 Paper 13h ago
"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms
arXiv:2606.12618v1 Announce Type: new Abstract: Robust lie detectors for language models could enable powerful techniques for auditing, monitoring, and post-hoc
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 13h ago
TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generation
arXiv:2606.12657v1 Announce Type: new Abstract: Human mobility data is important for transportation, urban planning, and epidemic control, but large-scale traje
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
arXiv:2606.12674v1 Announce Type: new Abstract: Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
From AGI to ASI
arXiv:2606.12683v1 Announce Type: new Abstract: Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculatio
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 13h ago
Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System
arXiv:2606.12702v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly integrated into clinical systems, making it essential to evaluate
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 13h ago
Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGI
arXiv:2606.12713v1 Announce Type: new Abstract: Claims that artificial general intelligence has already arrived and claims that it remains decades away are ofte
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanism
arXiv:2606.12721v1 Announce Type: new Abstract: Inferring others' beliefs requires more than reading surface signals; it requires tracking who told them what, i
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 13h ago
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior
arXiv:2606.12730v1 Announce Type: new Abstract: Anticipating LLM behavioral tendencies from low-cost psychometric probes is critical for safe deployment, but on
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
Benchmarking AI Agents for Addressing Scientific Challenges Across Scales
arXiv:2606.12736v1 Announce Type: new Abstract: AI agents are increasingly being developed to accelerate scientific discovery, yet their practical capabilities
ArXiv cs.AI 📐 ML Fundamentals 📄 Paper ⚡ AI Lesson 13h ago
Reducing the Complexity of Deep Learning Models for EEG Analysis on Wearable Devices
arXiv:2606.12742v1 Announce Type: new Abstract: Wearable healthcare devices are the fastest-growing Internet of Things (IoT) sector. Many automated healthcare s
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 13h ago
Prefill Awareness in Large Language Models
arXiv:2606.12747v1 Announce Type: new Abstract: Safety-relevant studies of language models, including alignment and jailbreaking evaluations and AI control prot
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 13h ago
Constructing Evaluation Datasets for Procedural Reasoning: Balancing Naturalness, Grounding, and Multi-Hop Coverage
arXiv:2606.12767v1 Announce Type: new Abstract: Evaluating procedural reasoning in AI-supported learning systems requires question-answer datasets that are both
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
A Tutorial on World Models and Physical AI
arXiv:2606.12783v1 Announce Type: new Abstract: World modeling is emerging as a central principle for building intelligent systems capable of prediction, reason
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements
arXiv:2606.12797v1 Announce Type: new Abstract: Agentic large language model systems that autonomously invoke tools, maintain persistent memory, and execute mul
ArXiv cs.AI 📄 Paper 13h ago
MLUBench: A Benchmark for Lifelong Unlearning Evaluation in MLLMs
arXiv:2606.12809v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) are trained on massive multimodal data, making data unlearning increasi
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents
arXiv:2606.12817v1 Announce Type: new Abstract: Understanding the digital world on mobile devices is shifting from static UI perception to dynamic action compre
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 13h ago
GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models
arXiv:2606.12821v1 Announce Type: new Abstract: Environmental scientists spend disproportionate effort on data wrangling rather than analysis, and AI agents tha
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 13h ago
Topical Phase Transitions in Artificial Intelligence Research: Large-Scale Evidence and an Early-Warning Signature for Emerging Topics
arXiv:2606.12828v1 Announce Type: new Abstract: Do research topics in artificial intelligence grow gradually, or do they advance through abrupt, detectable jump
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 13h ago
Fantastic Scientific Agents and How to Build Them: AgentBuild for Rietveld Refinement
arXiv:2606.12834v1 Announce Type: new Abstract: As scientific workflows shift from deterministic executables to LLM-based agents, the development practices on o