Foundations
Computer Vision
Object detection, segmentation, YOLO, CLIP, and vision-language models
Skills in this topic
3 skills — Sign in to track your progress
Showing 225 reads from curated sources
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
LensWalk: Agentic Video Understanding by Planning How You See in Videos
arXiv:2603.24558v1 Announce Type: cross Abstract: The dense, temporal nature of video presents a profound challenge for automated analysis. Despite the use of p
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
arXiv:2603.24575v1 Announce Type: cross Abstract: Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
DIDLM: A SLAM Dataset for Difficult Scenarios Featuring Infrared, Depth Cameras, LIDAR, 4D Radar, and Others under Adverse Weather, Low Light Conditions, and Rough Roads
arXiv:2404.09622v3 Announce Type: replace-cross Abstract: Adverse weather conditions, low-light environments, and bumpy road surfaces pose significant challenge
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
arXiv:2505.18047v3 Announce Type: replace-cross Abstract: The use of latent diffusion models (LDMs) such as Stable Diffusion has significantly improved the perc
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting
arXiv:2505.20714v3 Announce Type: replace-cross Abstract: Indoor environments typically contain diverse RF signals distributed across multiple frequency bands,
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition
arXiv:2603.13904v2 Announce Type: replace-cross Abstract: For robotic agents operating in dynamic environments, learning visual state representations from strea
The Verge
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Intel and LG Display may have beaten Apple and Qualcomm with the best laptop battery life ever
One of the coolest laptops we saw at CES in January was the new Dell XPS 16, with a unique 1-120Hz variable refresh rate display that can sip power when you don
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal
arXiv:2603.22844v1 Announce Type: new Abstract: Surgical smoke severely degrades intraoperative video quality, obscuring anatomical structures and limiting surg
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing
arXiv:2603.22466v1 Announce Type: cross Abstract: Always-on sensing is essential for next-generation edge/wearable AI systems, yet continuous high-fidelity RGB
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
UAV-DETR: DETR for Anti-Drone Target Detection
arXiv:2603.22841v1 Announce Type: cross Abstract: Drone detection is pivotal in numerous security and counter-UAV applications. However, existing deep learning-
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
UniQueR: Unified Query-based Feedforward 3D Reconstruction
arXiv:2603.22851v1 Announce Type: cross Abstract: We present UniQueR, a unified query-based feedforward framework for efficient and accurate 3D reconstruction f
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
YOLOv10 with Kolmogorov-Arnold networks and vision-language foundation models for interpretable object detection and trustworthy multimodal AI in computer vision perception
arXiv:2603.23037v1 Announce Type: cross Abstract: The interpretable object detection capabilities of a novel Kolmogorov-Arnold network framework are examined he
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion
arXiv:2505.22564v2 Announce Type: replace-cross Abstract: Video dataset condensation aims to reduce the immense computational cost of video processing. However,
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
MS-DGCNN++: Multi-Scale Dynamic Graph Convolution with Scale-Dependent Normalization for Robust LiDAR Tree Species Classification
arXiv:2507.12602v2 Announce Type: replace-cross Abstract: Graph-based deep learning on LiDAR point clouds encodes geometry through edge features, yet standard i
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
arXiv:2510.26865v2 Announce Type: replace-cross Abstract: Reading measurement instruments is effortless for humans and requires relatively little domain experti
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network
arXiv:2511.20008v2 Announce Type: replace-cross Abstract: Pedestrian crossing intention prediction is essential for the deployment of autonomous vehicles (AVs)
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
LPNSR: Prior-Enhanced Diffusion Image Super-Resolution via LR-Guided Noise Prediction
arXiv:2603.21045v2 Announce Type: replace-cross Abstract: Diffusion-based image super-resolution (SR), which aims to reconstruct high-resolution (HR) images fro
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation
arXiv:2603.22153v2 Announce Type: replace-cross Abstract: Recent advances in cross-view geo-localization (CVGL) methods have shown strong potential for supporti
TechCrunch AI
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Arm is releasing the first in-house chip in its 35-year history
Arm is producing its own CPU for the first time. It developed the CPU with Meta, which is also the chip's first customer.
Dev.to AI
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Tinyvision:-Building Ultra-Lightweight Models for Image Tasks(Part-1)
How Small Can Image Classifiers Get? My Experiments with Ultra-Lightweight Models The repo is at https://github.com/SaptakBhoumik/TinyVision . If you find it in

Forbes Innovation
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Ugreen Reveals Its New Generation Maxidok Thunderbolt 5 Docks
Ugreen's new range of Thunderbolt 5 Maxidok docking stations for Mac and PC users can leverage the 120Gbps data transfer speeds available with the latest standa
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment
arXiv:2603.19609v1 Announce Type: cross Abstract: We present LoD-Loc v3, a novel method for generalized aerial visual localization in dense urban environments.
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding
arXiv:2603.19667v1 Announce Type: cross Abstract: Human visual reconstruction aims to reconstruct fine-grained visual stimuli based on subject-provided descript
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
RAM: Recover Any 3D Human Motion in-the-Wild
arXiv:2603.19929v1 Announce Type: cross Abstract: RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity ass
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
arXiv:2603.20193v1 Announce Type: cross Abstract: Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true ed
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers
arXiv:2507.16214v3 Announce Type: replace-cross Abstract: Accurate and robust relative pose estimation is crucial for enabling challenging Active Debris Removal
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
3D-Consistent Multi-View Editing by Correspondence Guidance
arXiv:2511.22228v2 Announce Type: replace-cross Abstract: Recent advancements in diffusion and flow models have greatly improved text-based image editing, yet m
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors
arXiv:2603.18782v2 Announce Type: replace-cross Abstract: Recent progress in 3D generation has been driven largely by models conditioned on images or text, whil
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
1mo ago
CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization
arXiv:2603.19121v2 Announce Type: replace-cross Abstract: The creation of high-fidelity, customizable 3D indoor scene textures remains a significant challenge.

Forbes Innovation
👁️ Computer Vision
⚡ AI Lesson
1mo ago
Is Apple's New MacBook Pro Right For You?
Apple's new MacBook Pro M5 Pro and M5 Max models are the latest iteration of the macOS laptops. But are they stopgaps until the real revolution in 2027.
ZDNet AI
👁️ Computer Vision
⚡ AI Lesson
2mo ago
Yes, 8GB of RAM really is enough for a MacBook in 2026 - here's why
If you're worried the Neo - or any other modern-day MacBook with 8GB of RAM - doesn't have enough memory, maybe you're not looking in the right place.

Hackernoon
👁️ Computer Vision
⚡ AI Lesson
2mo ago
Forget Blender Skills: This AI Generates Complete 3D Objects for You
GET3D is an AI system that generates complete 3D models—geometry and textures—from simple 2D images. Unlike older methods, it produces ready-to-use assets compa
ZDNet AI
👁️ Computer Vision
⚡ AI Lesson
2mo ago
What is MoCA 2.5? How this low-cost networking can replace Wi-Fi and fix dead zones
MoCA 2.5 leverages old coaxial cables to enable high-speed internet. I break down the technology and why it's a viable alternative to Wi-Fi.
ZDNet AI
👁️ Computer Vision
⚡ AI Lesson
2mo ago
I wore the Whoop 5.0 for a month - it combines the best of the Oura Ring and Apple Watch
The Whoop 5.0 has several medical-minded tools, like ECG and blood pressure monitoring. Here's how I actually used them.

NVIDIA AI Blog
👁️ Computer Vision
⚡ AI Lesson
2mo ago
More Than Meets the Eye: NVIDIA RTX-Accelerated Computers Now Connect Directly to Apple Vision Pro
NVIDIA and Apple’s collaboration brings native integration of NVIDIA CloudXR 6.0 to visionOS, securely delivering NVIDIA RTX-powered simulators and professional
Towards Data Science
👁️ Computer Vision
⚡ AI Lesson
2mo ago
The Current Status of The Quantum Software Stack
How do we program quantum computers today? The post The Current Status of The Quantum Software Stack appeared first on Towards Data Science .
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
2mo ago
Preview The Embedded Vision Summit 2026 Conference On OpenCV Live
Join the organizers of the Embedded Vision Summit on this preview webinar for an insider look at the premier conference on practical computer vision and edge AI
DeepMind Blog
👁️ Computer Vision
⚡ AI Lesson
2mo ago
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Our latest image generation model offers advanced world knowledge, production ready specs, subject consistency and more, all at Flash speed.

Google AI Blog
👁️ Computer Vision
⚡ AI Lesson
2mo ago
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Our latest image generation model offers advanced world knowledge, production-ready specs, subject consistency and more, all at Flash speed.

Google AI Blog
👁️ Computer Vision
⚡ AI Lesson
2mo ago
Build with Nano Banana 2, our best image generation and editing model
Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-level intelligence and fidelity for all image applications.

Microsoft Research
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions
As synthetic media grows, verifying what’s real, and the origin of content, matters more than ever. Our latest report explores media integrity and authenticatio

Replicate Blog
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Recraft V4: image generation with design taste
Recraft V4 generates art-directed images — and actual editable SVGs — with strong composition, accurate text rendering, and what the Recraft team calls "design
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Calling Roboticists & Vision Experts: Tackle Dexterous Manipulation and Win Big in the AI for Industry Challenge
A real-world robotics challenge with a $180K prize pool, where innovation and industry impact collide. We’re standing at an inflection point in robotics: electr
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Real-Time Face Tracking: OpenCV Control of a UR Robot
This project controls a Universal Robots UR5 using real-time face tracking built with OpenCV. A standard webcam provides a live video stream that detects a huma
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Part 3: Simultaneous Localization & Mapping: Which SLAM Is For You? on OpenCV Live!
Note: This event has been rescheduled but the links still work. Simultaneous Localization & Mapping (SLAM) is one of the most active and contentious areas of CV
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
3mo ago
OpenCV Live: The Low-Power Computer Vision Challenge 2026
This year the Low-Power Computer Vision Challenge (LPCV) has three tracks with serious prize money including Image-to-Text Retrieval, Action Recognition in Vide
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
3mo ago
From Image Features to Visual Place Recognition: OpenCV Approach
In this blog, we explore Visual Place Recognition (VPR) with hands-on examples using OpenCV and lightweight Python tools. You will create a practical VPR pipeli
DeepMind Blog
👁️ Computer Vision
⚡ AI Lesson
4mo ago
D4RT: Teaching AI to see the world in four dimensions
D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.
DeepCamp AI