Foundations
Computer Vision
Object detection, segmentation, YOLO, CLIP, and vision-language models
Skills in this topic
3 skills — Sign in to track your progress
Showing 225 reads from curated sources
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety
arXiv:2603.29777v1 Announce Type: cross Abstract: Public spaces such as transport hubs, city centres, and event venues require timely and reliable detection of
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines
arXiv:2603.29927v1 Announce Type: cross Abstract: Transferring large volumes of high-resolution images during wind turbine inspections introduces a bottleneck i
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Streaming 4D Visual Geometry Transformer
arXiv:2507.11539v2 Announce Type: replace-cross Abstract: Perceiving and reconstructing 3D geometry from videos is a fundamental yet challenging computer vision

Hackernoon
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Background-removal model by Pixelcut: A Model Overview
background-removal is an AI-powered tool created by Pixelcut that handles the task of removing backgrounds from images with precision and speed.
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
1mo ago
When the Track Is Your Lab: Meet the Team Racing Without a Driver
What does it take to build an AI that competes in professional motorsports — no driver, no remote control, just autonomous decision-making at race speed? Find o

ArsTechnica Tech
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Quantum computers need vastly fewer resources than thought to break vital encryption
is coming, and it won't be as expensive as thought.]]>
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
An End-to-end Flight Control Network for High-speed UAV Obstacle Avoidance based on Event-Depth Fusion
arXiv:2603.27181v1 Announce Type: cross Abstract: Achieving safe, high-speed autonomous flight in complex environments with static, dynamic, or mixed obstacles
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Guided Lensless Polarization Imaging
arXiv:2603.27357v1 Announce Type: cross Abstract: Polarization imaging captures the polarization state of light, revealing information invisible to the human ey
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Attend The OpenCV-SID Conference On Computer Vision & AI This May 4th
OpenCV is continuing our partnership with the awesome Display Week conference, joining them in Los Angeles this May 4th for a special one-day event packed with
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation
arXiv:2603.25863v1 Announce Type: cross Abstract: This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
DenseSwinV2: Channel Attentive Dual Branch CNN Transformer Learning for Cassava Leaf Disease Classification
arXiv:2603.25935v1 Announce Type: cross Abstract: This work presents a new Hybrid Dense SwinV2, a two-branch framework that jointly leverages densely connected
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets
arXiv:2603.25946v1 Announce Type: cross Abstract: High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by t
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
arXiv:2603.26015v1 Announce Type: cross Abstract: Human age estimation from facial images represents a challenging computer vision task with significant applica
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splatting
arXiv:2603.26067v1 Announce Type: cross Abstract: Physical adversarial camouflage poses a severe security threat to autonomous driving systems by mapping advers
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
An Object Web Seminar: A Retrospective on a Technical Dialogue Still Reverbarating
arXiv:2603.26203v1 Announce Type: cross Abstract: Technology change happens quickly such that new trends tend to crowd out the focus on what was new just yester
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation
arXiv:2603.26260v1 Announce Type: cross Abstract: Open-vocabulary 3D semantic segmentation aims to segment arbitrary categories beyond the training set. Existin
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
arXiv:2603.26551v1 Announce Type: cross Abstract: Vision backbone networks play a central role in modern computer vision. Enhancing their efficiency directly be
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
arXiv:2603.26653v1 Announce Type: cross Abstract: We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric vide
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Binary Verification for Zero-Shot Vision
arXiv:2511.10983v2 Announce Type: replace-cross Abstract: We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs.
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Particulate: Feed-Forward 3D Object Articulation
arXiv:2512.11798v2 Announce Type: replace-cross Abstract: We introduce Particulate, a feed-forward model that, given a 3D mesh of an object, infers its articula
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution
arXiv:2601.07855v2 Announce Type: replace-cross Abstract: For 3D perception systems to operate reliably in real-world environments, they must remain robust to e
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
arXiv:2601.13622v3 Announce Type: replace-cross Abstract: Large vision-language models (LVLMs) are typically trained using autoregressive language modeling obje
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Towards single-shot coherent imaging via overlap-free ptychography
arXiv:2602.21361v2 Announce Type: replace-cross Abstract: Ptychographic imaging at synchrotron and XFEL sources requires dense overlapping scans, limiting throu
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
arXiv:2603.14375v2 Announce Type: replace-cross Abstract: While recent generative video models have achieved remarkable visual realism and are being explored as

Forbes Innovation
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Google Confirms High-Risk Update For 3.5 Billion Chrome Users
Nearly all 3.5 billion Chrome browser users will soon see a ‘high-risk’ security update from Google. Here’s what you need to know.

Forbes Innovation
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Uh Oh—New ‘Hack Yourself’ Apple Mac Attack Can Steal Your Passwords
A newly discovered attack sandbags Apple users into hacking themselves. Here’s what all Mac users need to know.
Dev.to AI
👁️ Computer Vision
⚡ AI Lesson
1mo ago
$58.3B in Synthetic Fraud Warns Investigators: "I Eyeballed It" Won't Hold Up Much Longer
The $58 Billion Synthetic Identity Crisis For developers building computer vision pipelines, biometric authentication, or OSINT tools, the latest fraud projecti

Hackernoon
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Building Ultra-Lightweight Image Classifiers with TinyVision (Part 1)
This article explores how small image classification models can get while remaining effective. Using handcrafted feature pipelines and compact CNN architectures

Hackernoon
👁️ Computer Vision
⚡ AI Lesson
1mo ago
When Verified Source Lies
I deployed a staking vault on Sepolia and got it verified on Etherscan with a green checkmark. The source code contains a storage write that does not exist in t
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Is Geometry Enough? An Evaluation of Landmark-Based Gaze Estimation
arXiv:2603.24724v1 Announce Type: cross Abstract: Appearance-based gaze estimation frequently relies on deep Convolutional Neural Networks (CNNs). These models
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
MoireMix: A Formula-Based Data Augmentation for Improving Image Classification Robustness
arXiv:2603.25109v1 Announce Type: cross Abstract: Data augmentation is a key technique for improving the robustness of image classification models. However, man
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling
arXiv:2603.25170v1 Announce Type: cross Abstract: In complex environments, infrared object detection exhibits broad applicability and stability across diverse s
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Image Rotation Angle Estimation: Comparing Circular-Aware Methods
arXiv:2603.25351v1 Announce Type: cross Abstract: Automatic image rotation estimation is a key preprocessing step in many vision pipelines. This task is challen
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Challenges in Hyperspectral Imaging for Autonomous Driving: The HSI-Drive Case
arXiv:2603.25510v1 Announce Type: cross Abstract: The use of hyperspectral imaging (HSI) in autonomous driving (AD), while promising, faces many challenges rela
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
arXiv:2603.25524v1 Announce Type: cross Abstract: Long-term behavioral monitoring of individual animals is crucial for studying behavioral changes that occur ov
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming
arXiv:2603.25686v1 Announce Type: cross Abstract: Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-refere
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
PixelSmile: Toward Fine-Grained Facial Expression Editing
arXiv:2603.25728v1 Announce Type: cross Abstract: Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, w
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Generative deep learning for foundational video translation in ultrasound
arXiv:2511.03255v2 Announce Type: replace-cross Abstract: Deep learning (DL) has the potential to revolutionize image acquisition and interpretation across medi
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting
arXiv:2601.03824v3 Announce Type: replace-cross Abstract: Generalizable 3D Gaussian Splatting aims to directly predict Gaussian parameters using a feed-forward
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy
arXiv:2602.01939v3 Announce Type: replace-cross Abstract: Recently, active vision has reemerged as an important concept for manipulation, since visual occlusion
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Monocular Normal Estimation via Shading Sequence Estimation
arXiv:2602.09929v5 Announce Type: replace-cross Abstract: Monocular normal estimation aims to estimate the normal map from a single RGB image of an object under
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing
arXiv:2603.00141v3 Announce Type: replace-cross Abstract: Image Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by e

Microsoft Research
👁️ Computer Vision
⚡ AI Lesson
1mo ago
AsgardBench: A benchmark for visually grounded interactive planning
Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Estimating Individual Tree Height and Species from UAV Imagery
arXiv:2603.23669v1 Announce Type: cross Abstract: Accurate estimation of forest biomass, a major carbon sink, relies heavily on tree-level traits such as height
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Revealing Multi-View Hallucination in Large Vision-Language Models
arXiv:2603.23934v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from d
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking
arXiv:2603.23940v1 Announce Type: cross Abstract: The proliferation of AIGC-driven face manipulation and deepfakes poses severe threats to media provenance, int
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Language-Guided Structure-Aware Network for Camouflaged Object Detection
arXiv:2603.24355v1 Announce Type: cross Abstract: Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with the background in t
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
SEGAR: Selective Enhancement for Generative Augmented Reality
arXiv:2603.24541v1 Announce Type: cross Abstract: Generative world models offer a compelling foundation for augmented-reality (AR) applications: by predicting f
DeepCamp AI