📰 ArXiv cs.AI
Articles from ArXiv cs.AI · 7,014 articles · Updated every 3 hours · View all reads
All
⚡ AI Lessons (19117)
ArXiv cs.AIDev.to AIDev.to · FORUM WEBForbes InnovationMedium · ProgrammingMedium · AI
ArXiv cs.AI
📄 Paper
1w ago
Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation
arXiv:2604.11775v1 Announce Type: cross Abstract: Perturbation-based explainability methods such as KernelSHAP provide model-agnostic attributions but are typic
ArXiv cs.AI
📄 Paper
1w ago
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
arXiv:2604.11778v1 Announce Type: cross Abstract: Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in
ArXiv cs.AI
📄 Paper
1w ago
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
arXiv:2604.11784v1 Announce Type: cross Abstract: GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with a
ArXiv cs.AI
📄 Paper
1w ago
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection
arXiv:2604.11790v1 Announce Type: cross Abstract: Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating compl
ArXiv cs.AI
📄 Paper
1w ago
A Mechanistic Analysis of Looped Reasoning Language Models
arXiv:2604.11791v1 Announce Type: cross Abstract: Reasoning has become a central capability in large language models. Recent research has shown that reasoning p
ArXiv cs.AI
📄 Paper
1w ago
C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts
arXiv:2604.11796v1 Announce Type: cross Abstract: Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they off
ArXiv cs.AI
📄 Paper
1w ago
Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net
arXiv:2604.11798v1 Announce Type: cross Abstract: Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains t
ArXiv cs.AI
📄 Paper
1w ago
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
arXiv:2604.11805v1 Announce Type: cross Abstract: We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, m
ArXiv cs.AI
📄 Paper
1w ago
Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems
arXiv:2604.11807v1 Announce Type: cross Abstract: The stable operation of autonomous off-grid photovoltaic systems dictates reliance on solar forecasting algori
ArXiv cs.AI
📄 Paper
1w ago
Can Large Language Models Infer Causal Relationships from Real-World Text?
arXiv:2505.18931v4 Announce Type: replace Abstract: Understanding and inferring causal relationships from texts is a core aspect of human cognition and is essen
ArXiv cs.AI
📄 Paper
1w ago
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
arXiv:2506.02387v3 Announce Type: replace Abstract: Recent advancements in Vision Language Models (VLMs) have expanded their capabilities to interactive agent t
ArXiv cs.AI
📄 Paper
1w ago
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky
arXiv:2507.03336v4 Announce Type: replace Abstract: Large language models (LLMs) are increasingly tasked with invoking enterprise APIs, yet they routinely falte
ArXiv cs.AI
📄 Paper
1w ago
PosterGen: Aesthetic-Aware Multi-Modal Paper-to-Poster Generation via Multi-Agent LLMs
arXiv:2508.17188v2 Announce Type: replace Abstract: Multi-agent systems built upon large language models (LLMs) have demonstrated remarkable capabilities in tac
ArXiv cs.AI
📄 Paper
1w ago
ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care
arXiv:2509.00891v3 Announce Type: replace Abstract: Real-world adoption of closed-loop insulin delivery systems (CLIDS) in type 1 diabetes remains low, driven n
ArXiv cs.AI
📄 Paper
1w ago
RISK: A Framework for GUI Agents in E-commerce Risk Management
arXiv:2509.21982v2 Announce Type: replace Abstract: E-commerce risk management requires aggregating diverse, deeply embedded web data through multi-step, statef
ArXiv cs.AI
📄 Paper
1w ago
Interactive Learning for LLM Reasoning
arXiv:2509.26306v4 Announce Type: replace Abstract: Existing multi-agent learning approaches have developed interactive training environments to explicitly prom
ArXiv cs.AI
📄 Paper
1w ago
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
arXiv:2509.26627v2 Announce Type: replace Abstract: Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensi
ArXiv cs.AI
📄 Paper
1w ago
Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards
arXiv:2510.01544v2 Announce Type: replace Abstract: Diffusion-based large language models offer a non-autoregressive alternative for text generation, but enabli
ArXiv cs.AI
📄 Paper
1w ago
Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents
arXiv:2510.05188v2 Announce Type: replace Abstract: Although LLMs have been widely adopted for creative content generation, a single-pass process often struggle
ArXiv cs.AI
📄 Paper
1w ago
SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance
arXiv:2510.07972v3 Announce Type: replace Abstract: Query-product relevance prediction is vital for AI-driven e-commerce, yet current LLM-based approaches face
ArXiv cs.AI
📄 Paper
1w ago
Graph-Coarsening Approach for the Capacitated Vehicle Routing Problem with Time Windows
arXiv:2510.22329v2 Announce Type: replace Abstract: The Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) is a fundamental NP-hard optimization pro
ArXiv cs.AI
📄 Paper
1w ago
MGA: Memory-Driven GUI Agent for Observation-Centric Interaction
arXiv:2510.24168v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have significantly advanced GUI agents, yet long-horizon automation
ArXiv cs.AI
📄 Paper
1w ago
Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight
arXiv:2512.19691v3 Announce Type: replace Abstract: Reference labels for machine-learning benchmarks are increasingly synthesized with LLM assistance, but their
ArXiv cs.AI
📄 Paper
1w ago
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration
arXiv:2601.07224v2 Announce Type: replace Abstract: While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard pa
DeepCamp AI