3,169 articles

📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 3,169 articles · Updated every 3 hours · View all news

All ⚡ AI Lessons (8687) ArXiv cs.AIForbes InnovationOpenAI NewsDev.to AIHugging Face BlogHackernoon
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
arXiv:2604.01151v1 Announce Type: new Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Therefore I am. I Think
arXiv:2604.01202v1 Announce Type: new Abstract: We consider the question: when a large language reasoning model makes a choice, did it think first and then deci
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago
HippoCamp: Benchmarking Contextual Agents on Personal Computers
arXiv:2604.01221v1 Announce Type: new Abstract: We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management. U
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 1w ago
Agentic AI -- Physicist Collaboration in Experimental Particle Physics: A Proof-of-Concept Measurement with LEP Open Data
arXiv:2603.05735v2 Announce Type: cross Abstract: We present an AI agentic measurement of the thrust distribution in $e^{+}e^{-}$ collisions at $\sqrt{s}=91.2$~
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Two-Stage Optimizer-Aware Online Data Selection for Large Language Models
arXiv:2604.00001v1 Announce Type: cross Abstract: Gradient-based data selection offers a principled framework for estimating sample utility in large language mo
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Benchmark for Assessing Olfactory Perception of Large Language Models
arXiv:2604.00002v1 Announce Type: cross Abstract: Here we introduce the Olfactory Perception (OP) benchmark, designed to assess the capability of large language
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
A Reliability Evaluation of Hybrid Deterministic-LLM Based Approaches for Academic Course Registration PDF Information Extraction
arXiv:2604.00003v1 Announce Type: cross Abstract: This study evaluates the reliability of information extraction approaches from KRS documents using three strat
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
LinearARD: Linear-Memory Attention Distillation for RoPE Restoration
arXiv:2604.00004v1 Announce Type: cross Abstract: The extension of context windows in Large Language Models is typically facilitated by scaling positional encod
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Dynin-Omni: Omnimodal Unified Large Diffusion Language Model
arXiv:2604.00007v1 Announce Type: cross Abstract: We present Dynin-Omni, the first masked-diffusion-based omnimodal foundation model that unifies text, image, a
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
How Trustworthy Are LLM-as-Judge Ratings for Interpretive Responses? Implications for Qualitative Research Workflows
arXiv:2604.00008v1 Announce Type: cross Abstract: As qualitative researchers show growing interest in using automated tools to support interpretive analysis, a
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Eyla: Toward an Identity-Anchored LLM Architecture with Integrated Biological Priors -- Vision, Implementation Attempt, and Lessons from AI-Assisted Development
arXiv:2604.00009v1 Announce Type: cross Abstract: We present the design rationale, implementation attempt, and failure analysis of Eyla, a proposed identity-anc
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Can LLMs Perceive Time? An Empirical Investigation
arXiv:2604.00010v1 Announce Type: cross Abstract: Large language models cannot estimate how long their own tasks take. We investigate this limitation through fo
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager
arXiv:2604.00011v1 Announce Type: cross Abstract: The growing prominence of large language models (LLMs) in daily life has heightened concerns that LLMs exhibit
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms
arXiv:2604.00012v1 Announce Type: cross Abstract: Despite the impressive performance of general-purpose large language models (LLMs), they often require fine-tu
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
MSA-Thinker: Discrimination-Calibration Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis
arXiv:2604.00013v1 Announce Type: cross Abstract: Multimodal sentiment analysis aims to understand human emotions by integrating textual, auditory, and visual m
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Are they human? Detecting large language models by probing human memory constraints
arXiv:2604.00016v1 Announce Type: cross Abstract: The validity of online behavioral research relies on study participants being human rather than machine. In th
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Think Twice Before You Write -- an Entropy-based Decoding Strategy to Enhance LLM Reasoning
arXiv:2604.00018v1 Announce Type: cross Abstract: Decoding strategies play a central role in shaping the reasoning ability of large language models (LLMs). Trad
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation
arXiv:2604.00019v1 Announce Type: cross Abstract: We present a configurable pipeline for generating multilingual sets of entities with specified characteristics
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models
arXiv:2604.00021v1 Announce Type: cross Abstract: Alignment safety research assumes that ethical instructions improve model behavior, but how language models in
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago
Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
arXiv:2604.00022v1 Announce Type: cross Abstract: Multi-dimensional rubric-based dialogue evaluation is widely used to assess conversational AI, yet its criteri