📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 2,972 articles · Updated every 3 hours · View all news

All ⚡ AI Lessons (5792) ArXiv cs.AI Forbes Innovation OpenAI News Dev.to AI Hugging Face Blog Hackernoon

Schema-Aware Planning and Hybrid Knowledge Toolset for Reliable Knowledge Graph Triple Verification

arXiv:2604.04190v1 Announce Type: new Abstract: Knowledge Graphs (KGs) serve as a critical foundation for AI systems, yet their automated construction inevitabl

ArXiv cs.AI 📄 Paper 8h ago

Don't Blink: Evidence Collapse during Multimodal Reasoning

arXiv:2604.04207v1 Announce Type: new Abstract: Reasoning VLMs can become more accurate while progressively losing visual grounding as they think. This creates

ArXiv cs.AI 📄 Paper 8h ago

TimeSeek: Temporal Reliability of Agentic Forecasters

arXiv:2604.04220v1 Announce Type: new Abstract: We introduce TimeSeek, a benchmark for studying how the reliability of agentic LLM forecasters changes over a pr

ArXiv cs.AI 📄 Paper 8h ago

Pedagogical Safety in Educational Reinforcement Learning: Formalizing and Detecting Reward Hacking in AI Tutoring Systems

arXiv:2604.04237v1 Announce Type: new Abstract: Reinforcement learning (RL) is increasingly used to personalize instruction in intelligent tutoring systems, yet

ArXiv cs.AI 📄 Paper 8h ago

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

arXiv:2604.04247v1 Announce Type: new Abstract: Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inf

ArXiv cs.AI 📄 Paper 8h ago

MC-CPO: Mastery-Conditioned Constrained Policy Optimization

arXiv:2604.04251v1 Announce Type: new Abstract: Engagement-optimized adaptive tutoring systems may prioritize short-term behavioral signals over sustained learn

ArXiv cs.AI 📄 Paper 8h ago

Context Engineering: A Practitioner Methodology for Structured Human-AI Collaboration

arXiv:2604.04258v1 Announce Type: new Abstract: The quality of AI-generated output is often attributed to prompting technique, but extensive empirical observati

ArXiv cs.AI 📄 Paper 8h ago

Beyond Fluency: Toward Reliable Trajectories in Agentic IR

arXiv:2604.04269v1 Announce Type: new Abstract: Information Retrieval is shifting from passive document ranking toward autonomous agentic workflows that operate

ArXiv cs.AI 📄 Paper 8h ago

InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI

arXiv:2604.04274v1 Announce Type: new Abstract: Causal inference is central to scientific discovery, yet choosing appropriate methods remains challenging becaus

ArXiv cs.AI 📄 Paper 8h ago

Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

arXiv:2604.04281v1 Announce Type: new Abstract: Width expansion offers a practical route to reuse smaller causal-language-model checkpoints, but selecting a wid

ArXiv cs.AI 📄 Paper 8h ago

PanLUNA: An Efficient and Robust Query-Unified Multimodal Model for Edge Biosignal Intelligence

arXiv:2604.04297v1 Announce Type: new Abstract: Physiological foundation models (FMs) have shown promise for biosignal representation learning, yet most remain

ArXiv cs.AI 📄 Paper 8h ago

RESCORE: LLM-Driven Simulation Recovery in Control Systems Research Papers

arXiv:2604.04324v1 Announce Type: new Abstract: Reconstructing numerical simulations from control systems research papers is often hindered by underspecified pa

ArXiv cs.AI 📄 Paper 8h ago

Soft Tournament Equilibrium

arXiv:2604.04328v1 Announce Type: new Abstract: The evaluation of general-purpose artificial agents, particularly those based on large language models, presents

ArXiv cs.AI 📄 Paper 8h ago

Thermodynamic-Inspired Explainable GeoAI: Uncovering Regime-Dependent Mechanisms in Heterogeneous Spatial Systems

arXiv:2604.04339v1 Announce Type: new Abstract: Modeling spatial heterogeneity and associated critical transitions remains a fundamental challenge in geography

ArXiv cs.AI 📄 Paper 8h ago

Implementing surrogate goals for safer bargaining in LLM-based agents

arXiv:2604.04341v1 Announce Type: new Abstract: Surrogate goals have been proposed as a strategy for reducing risks from bargaining failures. A surrogate goal i

ArXiv cs.AI 📄 Paper 8h ago

Domain-Contextualized Inference: A Computable Graph Architecture for Explicit-Domain Reasoning

arXiv:2604.04344v1 Announce Type: new Abstract: We establish a computation-substrate-agnostic inference architecture in which domain is an explicit first-class

ArXiv cs.AI 📄 Paper 8h ago

RoboPhD: Evolving Diverse Complex Agents Under Tight Evaluation Budgets

arXiv:2604.04347v1 Announce Type: new Abstract: 2026 has brought an explosion of interest in LLM-guided evolution of agentic artifacts, with systems like GEPA a

ArXiv cs.AI 📄 Paper 8h ago

REAM: Merging Improves Pruning of Experts in LLMs

arXiv:2604.04356v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest mo

ArXiv cs.AI 📄 Paper 8h ago

Decocted Experience Improves Test-Time Inference in LLM Agents

arXiv:2604.04373v1 Announce Type: new Abstract: There is growing interest in improving LLMs without updating model parameters. One well-established direction is

ArXiv cs.AI 📄 Paper 8h ago

Optimizing Service Operations via LLM-Powered Multi-Agent Simulation

arXiv:2604.04383v1 Announce Type: new Abstract: Service system performance depends on how participants respond to design choices, but modeling these responses i

ArXiv cs.AI 📄 Paper 8h ago

Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis

arXiv:2604.04386v1 Announce Type: new Abstract: Numerous math benchmarks exist to evaluate LLMs' mathematical capabilities. However, most involve extensive manu

ArXiv cs.AI 📄 Paper 8h ago

Gradual Cognitive Externalization: A Framework for Understanding How Ambient Intelligence Externalizes Human Cognition

arXiv:2604.04387v1 Announce Type: new Abstract: Developers are publishing AI agent skills that replicate a colleague's communication style, encode a supervisor'

ArXiv cs.AI 📄 Paper 8h ago

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

arXiv:2604.04399v1 Announce Type: new Abstract: Evaluating GUI agents presents a distinct challenge: trajectories are long, visually grounded, and open-ended, y

ArXiv cs.AI 📄 Paper 8h ago

MolDA: Molecular Understanding and Generation via Large Language Diffusion Model

arXiv:2604.04403v1 Announce Type: new Abstract: Large Language Models (LLMs) have significantly advanced molecular discovery, but existing multimodal molecular