📰 ArXiv cs.AI
Articles from ArXiv cs.AI · 3,344 articles · Updated every 3 hours · View all reads
All
⚡ AI Lessons (18687)
ArXiv cs.AIDev.to AIDev.to · FORUM WEBForbes InnovationMedium · ProgrammingMedium · AI
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
From Context to Intent: Reasoning-Guided Function-Level Code Completion
arXiv:2508.09537v2 Announce Type: replace-cross Abstract: The growing capabilities of Large Language Models (LLMs) have led to their widespread adoption for fun
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
1mo ago
From Noisy Labels to Intrinsic Structure: A Geometric-Structural Dual-Guided Framework for Noise-Robust Medical Image Segmentation
arXiv:2509.02419v2 Announce Type: replace-cross Abstract: The effectiveness of convolutional neural networks in medical image segmentation relies on large-scale
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
1mo ago
From Editor to Dense Geometry Estimator
arXiv:2509.04338v2 Announce Type: replace-cross Abstract: Leveraging visual priors from pre-trained text-to-image (T2I) generative models has shown success in d
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
arXiv:2509.06027v2 Announce Type: replace-cross Abstract: With the development of large-scale diffusion-based and language-modeling-based generative models, imp
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
1mo ago
Selective Classifier-free Guidance for Zero-shot Text-to-speech
arXiv:2509.19668v2 Announce Type: replace-cross Abstract: In zero-shot text-to-speech, achieving a balance between fidelity to the target speaker and adherence
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
MARS: toward more efficient multi-agent collaboration for LLM reasoning
arXiv:2509.20502v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have achieved impressive results in natural language understanding, yet t
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
VL-KnG: Persistent Spatiotemporal Knowledge Graphs from Egocentric Video for Embodied Scene Understanding
arXiv:2510.01483v2 Announce Type: replace-cross Abstract: Vision-language models (VLMs) demonstrate strong image-level scene understanding but often lack persis
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using a GPT-Based VLM: A Preliminary Study on Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework
arXiv:2510.02001v4 Announce Type: replace-cross Abstract: Vision-language models (VLMs) such as GPT (Generative Pre-Trained Transformer) have shown potential fo
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
1mo ago
Counterfactual Identifiability via Dynamic Optimal Transport
arXiv:2510.08294v2 Announce Type: replace-cross Abstract: We address the open question of counterfactual identification for high-dimensional multivariate outcom
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
1mo ago
Happiness is Sharing a Vocabulary: A Study of Transliteration Methods
arXiv:2510.10827v2 Announce Type: replace-cross Abstract: Transliteration has emerged as a promising means to bridge the gap between various languages in multil
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents
arXiv:2510.14967v2 Announce Type: replace-cross Abstract: Large language model (LLM)-based agents are increasingly trained with reinforcement learning (RL) to e
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
arXiv:2510.15994v2 Announce Type: replace-cross Abstract: The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
GUIrilla: A Scalable Framework for Automated Desktop UI Exploration
arXiv:2510.16051v2 Announce Type: replace-cross Abstract: The performance and generalization of foundation models for interactive systems critically depend on t
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding
arXiv:2510.21356v2 Announce Type: replace-cross Abstract: Eye gaze offers valuable cues about attention, short-term intent, and future actions, making it a powe
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Quantifying Systemic Vulnerability in the Foundation Model Industry
arXiv:2510.23421v2 Announce Type: replace-cross Abstract: The foundation model industry exhibits unprecedented concentration in critical inputs: semiconductors,
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
arXiv:2510.26865v2 Announce Type: replace-cross Abstract: Reading measurement instruments is effortless for humans and requires relatively little domain experti
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
arXiv:2511.05919v3 Announce Type: replace-cross Abstract: LLMs are now an integral part of information retrieval. As such, their role as question answering chat
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding
arXiv:2511.12449v2 Announce Type: replace-cross Abstract: Recent Multimodal Large Language Models (MLLMs) have significantly advanced e-commerce product underst
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network
arXiv:2511.20008v2 Announce Type: replace-cross Abstract: Pedestrian crossing intention prediction is essential for the deployment of autonomous vehicles (AVs)
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation
arXiv:2511.21732v2 Announce Type: replace-cross Abstract: Humor, as both a creative human activity and a social binding mechanism, has long posed a major challe
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
arXiv:2512.02487v2 Announce Type: replace-cross Abstract: Recent advances in 3D scene-language understanding have leveraged Large Language Models (LLMs) for 3D
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
arXiv:2512.03454v3 Announce Type: replace-cross Abstract: Interpreting natural-language commands to localize target objects is critical for autonomous driving (
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Aware, User-Controlled Step Dynamics (proof-of-concept)
arXiv:2512.06737v3 Announce Type: replace-cross Abstract: The paper presents the formulation, implementation, and evaluation of the ArcGD optimiser. The evaluat
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1mo ago
Metaphor-based Jailbreak Attacks on Text-to-Image Models
arXiv:2512.10766v2 Announce Type: replace-cross Abstract: Text-to-image (T2I) models commonly incorporate defense mechanisms to prevent the generation of sensit
DeepCamp AI