📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 3,273 articles · Updated every 3 hours · View all news

arXiv:2604.01128v1 Announce Type: cross Abstract: This paper introduces the first systematic evaluation framework for quantifying the quality and risks of paper

ArXiv cs.AI 📐 ML Fundamentals 📄 Paper ⚡ AI Lesson 1w ago

Looking into a Pixel by Nonlinear Unmixing -- A Generative Approach

arXiv:2604.01141v1 Announce Type: cross Abstract: Due to the large footprint of pixels in remote sensing imagery, hyperspectral unmixing (HU) has become an impo

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning

arXiv:2604.01152v1 Announce Type: cross Abstract: We present Brainstacks, a modular architecture for continual multi-domain fine-tuning of large language models

ArXiv cs.AI 📐 ML Fundamentals 📄 Paper ⚡ AI Lesson 1w ago

AdaLoRA-QAT: Adaptive Low-Rank and Quantization-Aware Segmentation

arXiv:2604.01167v1 Announce Type: cross Abstract: Chest X-ray (CXR) segmentation is an important step in computer-aided diagnosis, yet deploying large foundatio

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning

arXiv:2604.01170v1 Announce Type: cross Abstract: While test-time scaling has enabled large language models to solve highly difficult tasks, state-of-the-art re

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Screening Is Enough

arXiv:2604.01178v1 Announce Type: cross Abstract: A core limitation of standard softmax attention is that it does not define a notion of absolute query--key rel

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems

arXiv:2604.01179v1 Announce Type: cross Abstract: Foundation vision-language models are becoming increasingly relevant to robotics because they can provide rich

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget

arXiv:2604.01195v1 Announce Type: cross Abstract: Search agents, which integrate language models (LMs) with web search, are becoming crucial for answering compl

ArXiv cs.AI 💻 AI-Assisted Coding 📄 Paper ⚡ AI Lesson 1w ago

Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction

arXiv:2604.01204v1 Announce Type: cross Abstract: Primitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

CliffSearch: Structured Agentic Co-Evolution over Theory and Code for Scientific Algorithm Discovery

arXiv:2604.01210v1 Announce Type: cross Abstract: Scientific algorithm discovery is iterative: hypotheses are proposed, implemented, stress-tested, and revised.

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

$\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution

arXiv:2604.01212v1 Announce Type: cross Abstract: As LLM agents tackle increasingly complex tasks, a critical question is whether they can maintain strategic co

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago

The Recipe Matters More Than the Kitchen:Mathematical Foundations of the AI Weather Prediction Pipeline

arXiv:2604.01215v1 Announce Type: cross Abstract: AI weather prediction has advanced rapidly, yet no unified mathematical framework explains what determines for

ArXiv cs.AI 📐 ML Fundamentals 📄 Paper ⚡ AI Lesson 1w ago

LAtent Phase Inference from Short time sequences using SHallow REcurrent Decoders (LAPIS-SHRED)

arXiv:2604.01216v1 Announce Type: cross Abstract: Reconstructing full spatio-temporal dynamics from sparse observations in both space and time remains a central

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Code Comprehension then Auditing for Unsupervised LLM Evaluation

arXiv:2410.03131v4 Announce Type: replace Abstract: Large Language Models (LLMs) for unsupervised code correctness evaluation have recently gained attention bec

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

arXiv:2501.09136v4 Announce Type: replace Abstract: Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation an

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment

arXiv:2503.02976v3 Announce Type: replace Abstract: Large language models (LLMs), initially developed for generative AI, are now evolving into agentic AI system

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering

arXiv:2505.12189v3 Announce Type: replace Abstract: Large language models (LLMs) exhibit reasoning biases, often conflating content plausibility with formal log

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning

arXiv:2506.13841v3 Announce Type: replace Abstract: Recent advances in large language models (LLMs), particularly those enhanced through reinforced post-trainin

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

HiMA-Ecom: Enabling Joint Training of Hierarchical Multi-Agent E-commerce Assistants

arXiv:2506.19846v2 Announce Type: replace Abstract: Hierarchical multi-agent systems based on large language models (LLMs) have become a common paradigm for bui

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Auto-Formulating Dynamic Programming Problems with Large Language Models

arXiv:2507.11737v2 Announce Type: replace Abstract: Dynamic programming (DP) is a fundamental method in operations research, but formulating DP models has tradi

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts

arXiv:2509.21743v2 Announce Type: replace Abstract: Large reasoning models improve accuracy by producing long reasoning traces, but this inflates latency and co

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents

arXiv:2509.25302v2 Announce Type: replace Abstract: The prevalent deployment of Large Language Model agents such as OpenClaw unlocks potential in real-world app

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming

arXiv:2510.18314v2 Announce Type: replace Abstract: As large language model (LLM) agents increasingly automate complex web tasks, they boost productivity while

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks

arXiv:2511.08206v4 Announce Type: replace Abstract: Structured Electronic Health Record (EHR) data stores patient information in relational tables and plays a c