3,273 articles

📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 3,273 articles · Updated every 3 hours · View all news

All ⚡ AI Lessons (8687) ArXiv cs.AIForbes InnovationOpenAI NewsDev.to AIHugging Face BlogHackernoon
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 2d ago
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
arXiv:2509.22258v5 Announce Type: replace-cross Abstract: Recent advances in vision-language models (VLMs) have achieved remarkable performance on standard medi
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 2d ago
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
arXiv:2509.23279v2 Announce Type: replace-cross Abstract: The rapid progress of image-to-video (I2V) generation models has introduced significant risks by enabl
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks
arXiv:2509.24186v2 Announce Type: replace-cross Abstract: Accuracy-based evaluation of Large Language Models (LLMs) measures benchmark-specific performance rath
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
ACT: Agentic Classification Tree
arXiv:2509.26433v4 Announce Type: replace-cross Abstract: When used in high-stakes settings, AI systems are expected to produce decisions that are transparent,
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
Autonomy Reshapes How Personalization Affects Privacy Concerns and Trust in LLM Agents
arXiv:2510.04465v2 Announce Type: replace-cross Abstract: LLM agents require personal information for personalization in order to effectively act on users' beha
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline
arXiv:2510.06800v3 Announce Type: replace-cross Abstract: As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
arXiv:2510.07985v3 Announce Type: replace-cross Abstract: Model pruning, i.e., removing a subset of model weights, has become a prominent approach to reducing t
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 2d ago
Clear Roads, Clear Vision: Advancements in Multi-Weather Restoration for Smart Transportation
arXiv:2510.09228v2 Announce Type: replace-cross Abstract: Adverse weather conditions such as haze, rain, and snow significantly degrade the quality of images an
ArXiv cs.AI 🛠️ AI Tools & Apps 📄 Paper ⚡ AI Lesson 2d ago
Leveraging Wireless Sensor Networks for Real-Time Monitoring and Control of Industrial Environments
arXiv:2510.13820v2 Announce Type: replace-cross Abstract: This research proposes an extensive technique for monitoring and controlling the industrial parameters
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
A Linguistics-Aware LLM Watermarking via Syntactic Predictability
arXiv:2510.13829v2 Announce Type: replace-cross Abstract: As large language models (LLMs) continue to advance rapidly, reliable governance tools have become cri
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
arXiv:2510.15148v2 Announce Type: replace-cross Abstract: Omni-modal large language models (OLLMs) aim to unify audio, vision, and text understanding within a s
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
arXiv:2510.15746v2 Announce Type: replace-cross Abstract: Ideal or real - that is the question.In this work, we explore whether principles from game theory can
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 2d ago
AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring
arXiv:2510.16066v4 Announce Type: replace-cross Abstract: Despite accounting for 96.1% of all businesses in Malaysia, access to financing remains one of the mos
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning
arXiv:2510.18814v2 Announce Type: replace-cross Abstract: Can language models improve their reasoning performance without external rewards, using only their own
ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 2d ago
Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems
arXiv:2510.20728v3 Announce Type: replace-cross Abstract: Exact scientific discovery requires more than heuristic search: candidate constructions must be turned
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE
arXiv:2510.25890v3 Announce Type: replace-cross Abstract: ATLAS is a constraint-guided generation framework for structured engineering artifacts whose outputs m
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection
arXiv:2511.06391v3 Announce Type: replace-cross Abstract: Optimization of offensive content moderation models for different types of hateful messages is typical
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms
arXiv:2511.06448v2 Announce Type: replace-cross Abstract: In this work, we study the risks of collective financial fraud in large-scale multi-agent systems powe
ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2d ago
FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis
arXiv:2511.08887v4 Announce Type: replace-cross Abstract: Stroke is an acute cerebrovascular disease, and timely diagnosis significantly improves patient surviv
ArXiv cs.AI 💻 AI-Assisted Coding 📄 Paper ⚡ AI Lesson 2d ago
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
arXiv:2511.20814v2 Announce Type: replace-cross Abstract: We present Sphinx, a synthetic environment for visual perception and reasoning that targets core cogni