📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 3,273 articles · Updated every 3 hours · View all news

All ⚡ AI Lessons (8687) ArXiv cs.AI Forbes Innovation OpenAI News Dev.to AI Hugging Face Blog Hackernoon

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks

arXiv:2603.11558v3 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) systems have shown strong potential for language-driven robotic manipulat

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

CHIMERA-Bench: A Benchmark Dataset for Epitope-Specific Antibody Design

arXiv:2603.13431v2 Announce Type: replace-cross Abstract: Computational antibody design has seen rapid methodological progress, with dozens of deep generative m

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

arXiv:2603.17205v2 Announce Type: replace-cross Abstract: Domain-specific finetuning is essential for dense retrievers, yet not all training pairs contribute eq

ArXiv cs.AI 📐 ML Fundamentals 📄 Paper ⚡ AI Lesson 1w ago

SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

arXiv:2603.17219v2 Announce Type: replace-cross Abstract: Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, wher

ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1w ago

The data heat island effect: quantifying the impact of AI data centers in a warming world

arXiv:2603.20897v2 Announce Type: replace-cross Abstract: The strong and continuous increase of AI-based services leads to the steady proliferation of AI data c

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

arXiv:2603.28902v1 Announce Type: new Abstract: Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusi

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

Working Paper: Towards a Category-theoretic Comparative Framework for Artificial General Intelligence

arXiv:2603.28906v1 Announce Type: new Abstract: AGI has become the Holly Grail of AI with the promise of level intelligence and the major Tech companies around

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

Towards Computational Social Dynamics of Semi-Autonomous AI Agents

arXiv:2603.28928v1 Announce Type: new Abstract: We present the first comprehensive study of emergent social organization among AI agents in hierarchical multi-a

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Enhancing Policy Learning with World-Action Model

arXiv:2603.28955v1 Announce Type: new Abstract: This paper presents the World-Action Model (WAM), an action-regularized world model that jointly reasons over fu

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research

arXiv:2603.28986v1 Announce Type: new Abstract: Current Autonomous Scientific Research (ASR) systems, despite leveraging large language models (LLMs) and agenti

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

arXiv:2603.28990v1 Announce Type: new Abstract: How much autonomy can multi-agent LLM systems sustain -- and what enables it? We present a 25,000-task computati

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

Emergence WebVoyager: Toward Consistent and Transparent Evaluation of (Web) Agents in The Wild

arXiv:2603.29020v1 Announce Type: new Abstract: Reliable evaluation of AI agents operating in complex, real-world environments requires methodologies that are r

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

The Future of AI is Many, Not One

arXiv:2603.29075v1 Announce Type: new Abstract: The way we're thinking about generative AI right now is fundamentally individual. We see this not just in how us

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering

arXiv:2603.29085v1 Announce Type: new Abstract: Large language models (LLMs) remain brittle on multi-hop question answering (MHQA), where answering requires com

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification

arXiv:2603.29112v1 Announce Type: new Abstract: We introduce GISTBench, a benchmark for evaluating Large Language Models' (LLMs) ability to understand users fro

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

arXiv:2603.29139v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language int

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour

arXiv:2603.29142v1 Announce Type: new Abstract: Formative feedback is central to effective learning, yet providing timely, individualised feedback at scale rema

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Knowledge database development by large language models for countermeasures against viruses and marine toxins

arXiv:2603.29149v1 Announce Type: new Abstract: Access to the most up-to-date information on medical countermeasures is important for the research and developme

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

SimMOF: AI agent for Automated MOF Simulations

arXiv:2603.29152v1 Announce Type: new Abstract: Metal-organic frameworks (MOFs) offer a vast design space, and as such, computational simulations play a critica

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Webscraper: Leverage Multimodal Large Language Models for Index-Content Web Scraping

arXiv:2603.29161v1 Announce Type: new Abstract: Modern web scraping struggles with dynamic, interactive websites that require more than static HTML parsing. Cur

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

arXiv:2603.29199v1 Announce Type: new Abstract: The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture,

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States

arXiv:2603.29206v1 Announce Type: new Abstract: Routing is widely used to scale large language models, from Mixture-of-Experts gating to multi-model/tool select

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems

arXiv:2603.29211v1 Announce Type: new Abstract: In recent years, multimodal large models have continued to improve on general benchmarks. However, in real-world

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents

arXiv:2603.29231v1 Announce Type: new Abstract: Existing benchmarks measure capability -- whether a model succeeds on a single attempt -- but production deploym