5,298 articles

📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 5,298 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (14985) ArXiv cs.AIDev.to AIDev.to · FORUM WEBForbes InnovationMedium · ProgrammingMedium · Machine Learning
ArXiv cs.AI 📄 Paper 6d ago
Learning to Play Piano in the Real World
arXiv:2503.15481v3 Announce Type: replace-cross Abstract: Towards the grand challenge of achieving human-level manipulation in robots, playing piano is a compel
ArXiv cs.AI 📄 Paper 6d ago
AccidentSim: Generating Vehicle Collision Videos with Physically Realistic Collision Trajectories from Real-World Accident Reports
arXiv:2503.20654v4 Announce Type: replace-cross Abstract: Collecting real-world vehicle accident videos for autonomous driving research is challenging due to th
ArXiv cs.AI 📄 Paper 6d ago
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs
arXiv:2503.23514v2 Announce Type: replace-cross Abstract: Large language models (LLMs) can carry out human-like dialogue, but unlike humans, they are stateless
ArXiv cs.AI 📄 Paper 6d ago
TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection
arXiv:2504.04099v2 Announce Type: replace-cross Abstract: Large Vision-Language Models have demonstrated remarkable capabilities, yet they suffer from hallucina
ArXiv cs.AI 📄 Paper 6d ago
Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights
arXiv:2504.06307v2 Announce Type: replace-cross Abstract: The rapid adoption of large language models (LLMs) has led to significant energy consumption and carbo
ArXiv cs.AI 📄 Paper 6d ago
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
arXiv:2504.13818v4 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as the leading approach for enhancin
ArXiv cs.AI 📄 Paper 6d ago
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
arXiv:2504.14386v2 Announce Type: replace-cross Abstract: Positional embeddings (PE) play a crucial role in Vision Transformers (ViTs) by providing spatial info
ArXiv cs.AI 📄 Paper 6d ago
Non-stationary Diffusion For Probabilistic Time Series Forecasting
arXiv:2505.04278v3 Announce Type: replace-cross Abstract: Due to the dynamics of underlying physics and external influences, the uncertainty of time series ofte
ArXiv cs.AI 📄 Paper 6d ago
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
arXiv:2505.04842v2 Announce Type: replace-cross Abstract: Prevalent reinforcement learning~(RL) methods for fine-tuning LLM reasoners, such as GRPO or Leave-one
ArXiv cs.AI 📄 Paper 6d ago
Auto-regressive transformation for image alignment
arXiv:2505.04864v2 Announce Type: replace-cross Abstract: Existing methods for image alignment struggle in cases involving feature-sparse regions, extreme scale
ArXiv cs.AI 📄 Paper 6d ago
Variational Visual Question Answering for Uncertainty-Aware Selective Prediction
arXiv:2505.09591v3 Announce Type: replace-cross Abstract: Despite remarkable progress in recent years, Vision Language Models (VLMs) remain prone to overconfide
ArXiv cs.AI 📄 Paper 6d ago
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
arXiv:2505.11737v4 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) have demonstrated impressive capabilities, their output quality rem
ArXiv cs.AI 📄 Paper 6d ago
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
arXiv:2505.13777v2 Announce Type: replace-cross Abstract: We present Sat2Sound, a unified multimodal framework for geospatial soundscape understanding, designed
ArXiv cs.AI 📄 Paper 6d ago
SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
arXiv:2505.17012v3 Announce Type: replace-cross Abstract: Existing evaluations of multimodal large language models (MLLMs) on spatial intelligence are typically
ArXiv cs.AI 📄 Paper 6d ago
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
arXiv:2505.17022v2 Announce Type: replace-cross Abstract: Visual generation models have made remarkable progress in creating realistic images from text prompts,
ArXiv cs.AI 📄 Paper 6d ago
Tuning Language Models for Robust Prediction of Diverse User Behaviors
arXiv:2505.17682v2 Announce Type: replace-cross Abstract: Predicting user behavior is essential for intelligent assistant services, yet deep learning models oft
ArXiv cs.AI 📄 Paper 6d ago
Learning World Models for Interactive Video Generation
arXiv:2505.21996v3 Announce Type: replace-cross Abstract: Foundational world models must be both interactive and preserve spatiotemporal coherence for effective
ArXiv cs.AI 📄 Paper 6d ago
Towards Reasonable Concept Bottleneck Models
arXiv:2506.05014v2 Announce Type: replace-cross Abstract: We propose a novel, flexible, and efficient framework for designing Concept Bottleneck Models (CBMs) t
ArXiv cs.AI 📄 Paper 6d ago
Progressive Multimodal Interaction Network for Reliable Quantification of Fish Feeding Intensity in Aquaculture
arXiv:2506.14170v3 Announce Type: replace-cross Abstract: Accurate quantification of fish feeding intensity is crucial for precision feeding in aquaculture, as
ArXiv cs.AI 📄 Paper 6d ago
LLM-based Realistic Safety-Critical Driving Video Generation
arXiv:2507.01264v2 Announce Type: replace-cross Abstract: Designing diverse and safety-critical driving scenarios is essential for evaluating autonomous driving