📰 ArXiv cs.AI
Articles from ArXiv cs.AI · 3,273 articles · Updated every 3 hours · View all news
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Does Unification Come at a Cost? Uni-SafeBench: A Safety Benchmark for Unified Multimodal Large Models
arXiv:2604.00547v1 Announce Type: new Abstract: Unified Multimodal Large Models (UMLMs) integrate understanding and generation capabilities within a single arch
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
BloClaw: An Omniscient, Multi-Modal Agentic Workspace for Next-Generation Scientific Discovery
arXiv:2604.00550v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) into life sciences has catalyzed the development of "AI Scientis
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents
arXiv:2604.00555v1 Announce Type: new Abstract: Enterprise adoption of Large Language Models (LLMs) is constrained by hallucination, domain drift, and the inabi
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Agent psychometrics: Task-level performance prediction in agentic coding benchmarks
arXiv:2604.00594v1 Announce Type: new Abstract: As the focus in LLM-based coding shifts from static single-step code generation to multi-step agentic interactio
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection
arXiv:2604.00716v1 Announce Type: new Abstract: Transformer language models contain localized reasoning circuits, contiguous layer blocks that improve reasoning
ArXiv cs.AI
🤖 AI Agents & Automation
📄 Paper
⚡ AI Lesson
1w ago
UK AISI Alignment Evaluation Case-Study
arXiv:2604.00788v1 Announce Type: new Abstract: This technical report presents methods developed by the UK AI Security Institute for assessing whether advanced
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning
arXiv:2604.00790v1 Announce Type: new Abstract: While large language models (LLMs) have demonstrated strong performance on complex reasoning tasks such as compe
ArXiv cs.AI
🤖 AI Agents & Automation
📄 Paper
⚡ AI Lesson
1w ago
Preference Guided Iterated Pareto Referent Optimisation for Accessible Route Planning
arXiv:2604.00795v1 Announce Type: new Abstract: We propose the Preference Guided Iterated Pareto Referent Optimisation (PG-IPRO) for urban route planning for pe
ArXiv cs.AI
🤖 AI Agents & Automation
📄 Paper
⚡ AI Lesson
1w ago
Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants
arXiv:2604.00842v1 Announce Type: new Abstract: Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assista
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models
arXiv:2604.00890v1 Announce Type: new Abstract: Geometric Problem Solving (GPS) remains at the heart of enhancing mathematical reasoning in large language model
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Experience as a Compass: Multi-agent RAG with Evolving Orchestration and Agent Prompts
arXiv:2604.00901v1 Announce Type: new Abstract: Multi-agent Retrieval-Augmented Generation (RAG), wherein each agent takes on a specific role, supports hard que
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
PsychAgent: An Experience-Driven Lifelong Learning Agent for Self-Evolving Psychological Counselor
arXiv:2604.00931v1 Announce Type: new Abstract: Existing methods for AI psychological counselors predominantly rely on supervised fine-tuning using static dialo
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
OmniMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
arXiv:2604.01007v1 Announce Type: new Abstract: AI agents increasingly operate over extended time horizons, yet their ability to retain, organize, and recall mu
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Adversarial Moral Stress Testing of Large Language Models
arXiv:2604.01108v1 Announce Type: new Abstract: Evaluating the ethical robustness of large language models (LLMs) deployed in software systems remains challengi
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
arXiv:2604.01151v1 Announce Type: new Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Therefore I am. I Think
arXiv:2604.01202v1 Announce Type: new Abstract: We consider the question: when a large language reasoning model makes a choice, did it think first and then deci
ArXiv cs.AI
🤖 AI Agents & Automation
📄 Paper
⚡ AI Lesson
1w ago
HippoCamp: Benchmarking Contextual Agents on Personal Computers
arXiv:2604.01221v1 Announce Type: new Abstract: We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management. U
ArXiv cs.AI
🤖 AI Agents & Automation
📄 Paper
⚡ AI Lesson
1w ago
Agentic AI -- Physicist Collaboration in Experimental Particle Physics: A Proof-of-Concept Measurement with LEP Open Data
arXiv:2603.05735v2 Announce Type: cross Abstract: We present an AI agentic measurement of the thrust distribution in $e^{+}e^{-}$ collisions at $\sqrt{s}=91.2$~
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Two-Stage Optimizer-Aware Online Data Selection for Large Language Models
arXiv:2604.00001v1 Announce Type: cross Abstract: Gradient-based data selection offers a principled framework for estimating sample utility in large language mo
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Benchmark for Assessing Olfactory Perception of Large Language Models
arXiv:2604.00002v1 Announce Type: cross Abstract: Here we introduce the Olfactory Perception (OP) benchmark, designed to assess the capability of large language
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
A Reliability Evaluation of Hybrid Deterministic-LLM Based Approaches for Academic Course Registration PDF Information Extraction
arXiv:2604.00003v1 Announce Type: cross Abstract: This study evaluates the reliability of information extraction approaches from KRS documents using three strat
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
LinearARD: Linear-Memory Attention Distillation for RoPE Restoration
arXiv:2604.00004v1 Announce Type: cross Abstract: The extension of context windows in Large Language Models is typically facilitated by scaling positional encod
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Dynin-Omni: Omnimodal Unified Large Diffusion Language Model
arXiv:2604.00007v1 Announce Type: cross Abstract: We present Dynin-Omni, the first masked-diffusion-based omnimodal foundation model that unifies text, image, a
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
How Trustworthy Are LLM-as-Judge Ratings for Interpretive Responses? Implications for Qualitative Research Workflows
arXiv:2604.00008v1 Announce Type: cross Abstract: As qualitative researchers show growing interest in using automated tools to support interpretive analysis, a
DeepCamp AI