📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 8,253 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (21843) ArXiv cs.AI Dev.to AI Medium · AI Medium · Programming Forbes Innovation Medium · Machine Learning

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 3w ago

VERDI: VLM-Embedded Reasoning for Autonomous Driving

arXiv:2505.15925v4 Announce Type: replace-cross Abstract: While autonomous driving (AD) stacks struggle with decision making under partial observability and rea

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 3w ago

Informatics for Food Processing

arXiv:2505.17087v2 Announce Type: replace-cross Abstract: This chapter explores the evolution, classification, and health implications of food processing, while

ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 3w ago

SoSBench: Benchmarking Safety Alignment on Six Scientific Domains

arXiv:2505.21605v3 Announce Type: replace-cross Abstract: Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and gr

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

LLMs Judging LLMs: A Simplex Perspective

arXiv:2505.21972v3 Announce Type: replace-cross Abstract: Given the challenge of automatically evaluating free-form outputs from large language models (LLMs), a

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

arXiv:2505.24535v3 Announce Type: replace-cross Abstract: Controlling multiple behavioral attributes in large language models (LLMs) at inference time is a chal

ArXiv cs.AI 💻 AI-Assisted Coding 📄 Paper ⚡ AI Lesson 3w ago

PhysGaia: A Physics-Aware Benchmark with Multi-Body Interactions for Dynamic Novel View Synthesis

arXiv:2506.02794v3 Announce Type: replace-cross Abstract: We introduce PhysGaia, a novel physics-aware benchmark for Dynamic Novel View Synthesis (DyNVS) that e

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

Large Language Models for Combinatorial Optimization of Design Structure Matrix

arXiv:2506.09749v3 Announce Type: replace-cross Abstract: In complex engineering systems, the dependencies among components or development activities are often

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

ZINA: Multimodal Fine-grained Hallucination Detection and Editing

arXiv:2506.13130v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) often generate hallucinations, where the output deviates from

ArXiv cs.AI 💻 AI-Assisted Coding 📄 Paper ⚡ AI Lesson 3w ago

Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications

arXiv:2506.19591v2 Announce Type: replace-cross Abstract: Cloud cover in multispectral imagery (MSI) poses significant challenges for early season crop mapping,

ArXiv cs.AI 📐 ML Fundamentals 📄 Paper ⚡ AI Lesson 3w ago

PRISM: Lightweight Multivariate Time-Series Classification through Symmetric Multi-Resolution Convolutional Layers

arXiv:2508.04503v3 Announce Type: replace-cross Abstract: Multivariate time series classification supports applications from wearable sensing to biomedical moni

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

Making Prompts First-Class Citizens for Adaptive LLM Pipelines

arXiv:2508.05012v2 Announce Type: replace-cross Abstract: Modern LLM pipelines increasingly resemble complex data-centric applications: they retrieve data, corr

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 3w ago

CATNet: A geometric deep learning approach for CAT bond spread prediction in the primary market

arXiv:2508.10208v2 Announce Type: replace-cross Abstract: Traditional models for pricing catastrophe (CAT) bonds struggle to capture the complex, relational dat

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 3w ago

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

arXiv:2508.13998v2 Announce Type: replace-cross Abstract: Generalization in embodied AI is hindered by the "seeing-to-doing gap," which stems from data scarcity

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference

arXiv:2508.16703v2 Announce Type: replace-cross Abstract: On-device running Large Language Models (LLMs) is nowadays a critical enabler towards preserving user

ArXiv cs.AI 📐 ML Fundamentals 📄 Paper ⚡ AI Lesson 3w ago

Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

arXiv:2509.05892v2 Announce Type: replace-cross Abstract: Accurate segmentation of carotid artery structures in histopathological images is vital for cardiovasc

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 3w ago

RAPTOR: A Foundation Policy for Quadrotor Control

arXiv:2509.11481v2 Announce Type: replace-cross Abstract: Humans are remarkably data-efficient when adapting to new unseen conditions, like driving a new car. I

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 3w ago

DoubleAgents: Human-Agent Alignment in a Socially Embedded Workflow

arXiv:2509.12626v3 Announce Type: replace-cross Abstract: Aligning agentic AI with user intent is critical for delegating complex, socially embedded tasks, yet

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 3w ago

Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

arXiv:2509.22258v5 Announce Type: replace-cross Abstract: Recent advances in vision-language models (VLMs) have achieved remarkable performance on standard medi

ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 3w ago

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

arXiv:2509.23279v2 Announce Type: replace-cross Abstract: The rapid progress of image-to-video (I2V) generation models has introduced significant risks by enabl

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

arXiv:2509.24186v2 Announce Type: replace-cross Abstract: Accuracy-based evaluation of Large Language Models (LLMs) measures benchmark-specific performance rath

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

ACT: Agentic Classification Tree

arXiv:2509.26433v4 Announce Type: replace-cross Abstract: When used in high-stakes settings, AI systems are expected to produce decisions that are transparent,

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

Autonomy Reshapes How Personalization Affects Privacy Concerns and Trust in LLM Agents

arXiv:2510.04465v2 Announce Type: replace-cross Abstract: LLM agents require personal information for personalization in order to effectively act on users' beha

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

arXiv:2510.06800v3 Announce Type: replace-cross Abstract: As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 3w ago

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

arXiv:2510.07985v3 Announce Type: replace-cross Abstract: Model pruning, i.e., removing a subset of model weights, has become a prominent approach to reducing t