Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,346
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline

Showing 225 reads from curated sources

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety
arXiv:2603.29777v1 Announce Type: cross Abstract: Public spaces such as transport hubs, city centres, and event venues require timely and reliable detection of
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines
arXiv:2603.29927v1 Announce Type: cross Abstract: Transferring large volumes of high-resolution images during wind turbine inspections introduces a bottleneck i
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Streaming 4D Visual Geometry Transformer
arXiv:2507.11539v2 Announce Type: replace-cross Abstract: Perceiving and reconstructing 3D geometry from videos is a fundamental yet challenging computer vision
Background-removal model by Pixelcut: A Model Overview
Hackernoon 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Background-removal model by Pixelcut: A Model Overview
background-removal is an AI-powered tool created by Pixelcut that handles the task of removing backgrounds from images with precision and speed.
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 1mo ago
When the Track Is Your Lab: Meet the Team Racing Without a Driver
What does it take to build an AI that competes in professional motorsports — no driver, no remote control, just autonomous decision-making at race speed? Find o
Quantum computers need vastly fewer resources than thought to break vital encryption
ArsTechnica Tech 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Quantum computers need vastly fewer resources than thought to break vital encryption
is coming, and it won't be as expensive as thought.]]>
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
An End-to-end Flight Control Network for High-speed UAV Obstacle Avoidance based on Event-Depth Fusion
arXiv:2603.27181v1 Announce Type: cross Abstract: Achieving safe, high-speed autonomous flight in complex environments with static, dynamic, or mixed obstacles
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Guided Lensless Polarization Imaging
arXiv:2603.27357v1 Announce Type: cross Abstract: Polarization imaging captures the polarization state of light, revealing information invisible to the human ey
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Attend The OpenCV-SID Conference On Computer Vision & AI This May 4th
OpenCV is continuing our partnership with the awesome Display Week conference, joining them in Los Angeles this May 4th for a special one-day event packed with
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation
arXiv:2603.25863v1 Announce Type: cross Abstract: This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
DenseSwinV2: Channel Attentive Dual Branch CNN Transformer Learning for Cassava Leaf Disease Classification
arXiv:2603.25935v1 Announce Type: cross Abstract: This work presents a new Hybrid Dense SwinV2, a two-branch framework that jointly leverages densely connected
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets
arXiv:2603.25946v1 Announce Type: cross Abstract: High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by t
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
arXiv:2603.26015v1 Announce Type: cross Abstract: Human age estimation from facial images represents a challenging computer vision task with significant applica
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splatting
arXiv:2603.26067v1 Announce Type: cross Abstract: Physical adversarial camouflage poses a severe security threat to autonomous driving systems by mapping advers
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
An Object Web Seminar: A Retrospective on a Technical Dialogue Still Reverbarating
arXiv:2603.26203v1 Announce Type: cross Abstract: Technology change happens quickly such that new trends tend to crowd out the focus on what was new just yester
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation
arXiv:2603.26260v1 Announce Type: cross Abstract: Open-vocabulary 3D semantic segmentation aims to segment arbitrary categories beyond the training set. Existin
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
arXiv:2603.26551v1 Announce Type: cross Abstract: Vision backbone networks play a central role in modern computer vision. Enhancing their efficiency directly be
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
arXiv:2603.26653v1 Announce Type: cross Abstract: We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric vide
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Binary Verification for Zero-Shot Vision
arXiv:2511.10983v2 Announce Type: replace-cross Abstract: We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs.
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Particulate: Feed-Forward 3D Object Articulation
arXiv:2512.11798v2 Announce Type: replace-cross Abstract: We introduce Particulate, a feed-forward model that, given a 3D mesh of an object, infers its articula
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution
arXiv:2601.07855v2 Announce Type: replace-cross Abstract: For 3D perception systems to operate reliably in real-world environments, they must remain robust to e
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
arXiv:2601.13622v3 Announce Type: replace-cross Abstract: Large vision-language models (LVLMs) are typically trained using autoregressive language modeling obje
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Towards single-shot coherent imaging via overlap-free ptychography
arXiv:2602.21361v2 Announce Type: replace-cross Abstract: Ptychographic imaging at synchrotron and XFEL sources requires dense overlapping scans, limiting throu
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
arXiv:2603.14375v2 Announce Type: replace-cross Abstract: While recent generative video models have achieved remarkable visual realism and are being explored as
Google Confirms High-Risk Update For 3.5 Billion Chrome Users
Forbes Innovation 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Google Confirms High-Risk Update For 3.5 Billion Chrome Users
Nearly all 3.5 billion Chrome browser users will soon see a ‘high-risk’ security update from Google. Here’s what you need to know.
Uh Oh—New ‘Hack Yourself’ Apple Mac Attack Can Steal Your Passwords
Forbes Innovation 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Uh Oh—New ‘Hack Yourself’ Apple Mac Attack Can Steal Your Passwords
A newly discovered attack sandbags Apple users into hacking themselves. Here’s what all Mac users need to know.
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
$58.3B in Synthetic Fraud Warns Investigators: "I Eyeballed It" Won't Hold Up Much Longer
The $58 Billion Synthetic Identity Crisis For developers building computer vision pipelines, biometric authentication, or OSINT tools, the latest fraud projecti
Building Ultra-Lightweight Image Classifiers with TinyVision (Part 1)
Hackernoon 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Building Ultra-Lightweight Image Classifiers with TinyVision (Part 1)
This article explores how small image classification models can get while remaining effective. Using handcrafted feature pipelines and compact CNN architectures
When Verified Source Lies
Hackernoon 👁️ Computer Vision ⚡ AI Lesson 1mo ago
When Verified Source Lies
I deployed a staking vault on Sepolia and got it verified on Etherscan with a green checkmark. The source code contains a storage write that does not exist in t
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Is Geometry Enough? An Evaluation of Landmark-Based Gaze Estimation
arXiv:2603.24724v1 Announce Type: cross Abstract: Appearance-based gaze estimation frequently relies on deep Convolutional Neural Networks (CNNs). These models
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
MoireMix: A Formula-Based Data Augmentation for Improving Image Classification Robustness
arXiv:2603.25109v1 Announce Type: cross Abstract: Data augmentation is a key technique for improving the robustness of image classification models. However, man
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling
arXiv:2603.25170v1 Announce Type: cross Abstract: In complex environments, infrared object detection exhibits broad applicability and stability across diverse s
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Image Rotation Angle Estimation: Comparing Circular-Aware Methods
arXiv:2603.25351v1 Announce Type: cross Abstract: Automatic image rotation estimation is a key preprocessing step in many vision pipelines. This task is challen
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Challenges in Hyperspectral Imaging for Autonomous Driving: The HSI-Drive Case
arXiv:2603.25510v1 Announce Type: cross Abstract: The use of hyperspectral imaging (HSI) in autonomous driving (AD), while promising, faces many challenges rela
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
arXiv:2603.25524v1 Announce Type: cross Abstract: Long-term behavioral monitoring of individual animals is crucial for studying behavioral changes that occur ov
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming
arXiv:2603.25686v1 Announce Type: cross Abstract: Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-refere
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
PixelSmile: Toward Fine-Grained Facial Expression Editing
arXiv:2603.25728v1 Announce Type: cross Abstract: Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, w
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Generative deep learning for foundational video translation in ultrasound
arXiv:2511.03255v2 Announce Type: replace-cross Abstract: Deep learning (DL) has the potential to revolutionize image acquisition and interpretation across medi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting
arXiv:2601.03824v3 Announce Type: replace-cross Abstract: Generalizable 3D Gaussian Splatting aims to directly predict Gaussian parameters using a feed-forward
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy
arXiv:2602.01939v3 Announce Type: replace-cross Abstract: Recently, active vision has reemerged as an important concept for manipulation, since visual occlusion
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Monocular Normal Estimation via Shading Sequence Estimation
arXiv:2602.09929v5 Announce Type: replace-cross Abstract: Monocular normal estimation aims to estimate the normal map from a single RGB image of an object under
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing
arXiv:2603.00141v3 Announce Type: replace-cross Abstract: Image Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by e
AsgardBench: A benchmark for visually grounded interactive planning
Microsoft Research 👁️ Computer Vision ⚡ AI Lesson 1mo ago
AsgardBench: A benchmark for visually grounded interactive planning
Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Estimating Individual Tree Height and Species from UAV Imagery
arXiv:2603.23669v1 Announce Type: cross Abstract: Accurate estimation of forest biomass, a major carbon sink, relies heavily on tree-level traits such as height
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Revealing Multi-View Hallucination in Large Vision-Language Models
arXiv:2603.23934v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from d
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking
arXiv:2603.23940v1 Announce Type: cross Abstract: The proliferation of AIGC-driven face manipulation and deepfakes poses severe threats to media provenance, int
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Language-Guided Structure-Aware Network for Camouflaged Object Detection
arXiv:2603.24355v1 Announce Type: cross Abstract: Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with the background in t
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
SEGAR: Selective Enhancement for Generative Augmented Reality
arXiv:2603.24541v1 Announce Type: cross Abstract: Generative world models offer a compelling foundation for augmented-reality (AR) applications: by predicting f