📰 ArXiv cs.AI
Articles from ArXiv cs.AI · 6,601 articles · Updated every 3 hours · View all reads
All
⚡ AI Lessons (17403)
ArXiv cs.AIDev.to AIDev.to · FORUM WEBForbes InnovationMedium · ProgrammingMedium · AI
ArXiv cs.AI
📄 Paper
1w ago
Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models
arXiv:2604.12832v1 Announce Type: cross Abstract: Deep learning-based medical image segmentation typically relies on ground truth (GT) labels obtained through m
ArXiv cs.AI
📄 Paper
1w ago
FastGrasp: Learning-based Whole-body Control method for Fast Dexterous Grasping with Mobile Manipulators
arXiv:2604.12879v1 Announce Type: cross Abstract: Fast grasping is critical for mobile robots in logistics, manufacturing, and service applications. Existing me
ArXiv cs.AI
📄 Paper
1w ago
Towards Long-horizon Agentic Multimodal Search
arXiv:2604.12890v1 Announce Type: cross Abstract: Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting te
ArXiv cs.AI
📄 Paper
1w ago
Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss
arXiv:2604.12911v1 Announce Type: cross Abstract: Multilingual benchmarks guide the development of frontier models. Yet multilingual evaluations reported by fro
ArXiv cs.AI
📄 Paper
1w ago
CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference
arXiv:2604.12913v1 Announce Type: cross Abstract: Binary decompilation is a critical reverse engineering task aimed at reconstructing high-level source code fro
ArXiv cs.AI
📄 Paper
1w ago
Distorted or Fabricated? A Survey on Hallucination in Video LLMs
arXiv:2604.12944v1 Announce Type: cross Abstract: Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video
ArXiv cs.AI
📄 Paper
1w ago
Parallax: Why AI Agents That Think Must Never Act
arXiv:2604.12986v1 Announce Type: cross Abstract: Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with pro
ArXiv cs.AI
📄 Paper
1w ago
ROSE: An Intent-Centered Evaluation Metric for NL2SQL
arXiv:2604.12988v1 Announce Type: cross Abstract: Execution Accuracy (EX), the widely used metric for evaluating the effectiveness of Natural Language to SQL (N
ArXiv cs.AI
📄 Paper
1w ago
LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software
arXiv:2604.12994v1 Announce Type: cross Abstract: Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead
ArXiv cs.AI
📄 Paper
1w ago
One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness
arXiv:2604.13006v1 Announce Type: cross Abstract: Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfuln
ArXiv cs.AI
📄 Paper
1w ago
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
arXiv:2604.13010v1 Announce Type: cross Abstract: On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. How
ArXiv cs.AI
📄 Paper
1w ago
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
arXiv:2604.13016v1 Announce Type: cross Abstract: On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet it
ArXiv cs.AI
📄 Paper
1w ago
Representation geometry shapes task performance in vision-language modeling for CT enterography
arXiv:2604.13021v1 Announce Type: cross Abstract: Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (
ArXiv cs.AI
📄 Paper
1w ago
Visual Preference Optimization with Rubric Rewards
arXiv:2604.13029v1 Announce Type: cross Abstract: The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality
ArXiv cs.AI
📄 Paper
1w ago
SmellNet: A Large-scale Dataset for Real-world Smell Recognition
arXiv:2506.00239v5 Announce Type: replace Abstract: The ability of AI to sense and identify various substances based on their smell alone can have profound impa
ArXiv cs.AI
📄 Paper
1w ago
Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models
arXiv:2506.14092v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly deployed in decision-support systems for high-stakes domains s
ArXiv cs.AI
📄 Paper
1w ago
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
arXiv:2507.22359v4 Announce Type: replace Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reli
ArXiv cs.AI
📄 Paper
1w ago
Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling
arXiv:2508.04282v3 Announce Type: replace Abstract: Recent benchmarks for memory-augmented reinforcement learning (RL) have introduced partially observable Mark
ArXiv cs.AI
📄 Paper
1w ago
Mantis: A Foundation Model for Mechanistic Disease Forecasting
arXiv:2508.12260v5 Announce Type: replace Abstract: Infectious disease forecasting in novel outbreaks or low-resource settings is hampered by the need for large
ArXiv cs.AI
📄 Paper
1w ago
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training
arXiv:2509.25758v2 Announce Type: replace Abstract: The remarkable capabilities of modern large reasoning models are largely unlocked through post-training tech
ArXiv cs.AI
📄 Paper
1w ago
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
arXiv:2509.25843v2 Announce Type: replace Abstract: Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be ci
ArXiv cs.AI
📄 Paper
1w ago
The Stackelberg Speaker: Optimizing Persuasive Communication in Social Deduction Games
arXiv:2510.09087v2 Announce Type: replace Abstract: Large language model (LLM) agents have shown remarkable progress in social deduction games (SDGs). However,
ArXiv cs.AI
📄 Paper
1w ago
Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution
arXiv:2510.23026v5 Announce Type: replace Abstract: Recent studies demonstrate that diffusion planners benefit from sparse-step planning over single-step planni
ArXiv cs.AI
📄 Paper
1w ago
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
arXiv:2510.23538v2 Announce Type: replace Abstract: The scope of neural code intelligence is rapidly expanding beyond text-based source code to encompass the ri
DeepCamp AI