Open Source Models with Hugging Face

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Open Source Models with Hugging Face

Coursera · Intermediate ·👁️ Computer Vision ·3mo ago

Skills: Modern CV Models80%AI-Assisted Code Review50%

Key Takeaways

Utilizes Hugging Face Hub and transformers library to perform NLP, audio, image, and multimodal tasks, and packages code into a user-friendly app using Gradio

Original Description

The availability of models and their weights for anyone to download enables a broader range of developers to innovate and create. In this course, you’ll select open source models from Hugging Face Hub to perform NLP, audio, image and multimodal tasks using the Hugging Face transformers library. Easily package your code into a user-friendly app that you can run on the cloud using Gradio and Hugging Face Spaces. You will: 1. Use the transformers library to turn a small language model into a chatbot capable of multi-turn conversations to answer follow-up questions. 2. Translate between languages, summarize documents, and measure the similarity between two pieces of text, which can be used for search and retrieval. 3. Convert audio to text with Automatic Speech Recognition (ASR), and convert text to audio using Text to Speech (TTS). 4. Perform zero-shot audio classification, to classify audio without fine-tuning the model. 5. Generate an audio narration describing an image by combining object detection and text-to-speech models. 6. Identify objects or regions in an image by prompting a zero-shot image segmentation model with points to identify the object that you want to select. 7. Implement visual question answering, image search, image captioning and other multimodal tasks. 8. Share your AI app using Gradio and Hugging Face Spaces to run your applications in a user-friendly interface on the cloud or as an API. The course will provide you with the building blocks that you can combine into a pipeline to build your AI-enabled applications!

Watch on External: Coursera ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Modern CV Models

View skill →

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

Statistical Learning: 10.Py Convolutional Neural Network: CIFAR Image Data I 2023

Statistical Learning: 10.Py Convolutional Neural Network: CIFAR Image Data I 2023

Stanford Online

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Nicholas Renotte

Deep Learning with PyTorch : Image Segmentation

Deep Learning with PyTorch : Image Segmentation

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

NVIDIA Developer

Related Reads

AI 3D Object Reconstruction for Crime Scenes

Learn how AI 3D object reconstruction can aid crime scene investigations by creating detailed, accurate models of evidence and environments.

What Is YOLOv8? An Introduction to the YOLOv8 Model Family

Learn about YOLOv8, a family of models for computer vision tasks, and why multiple variants are offered

What Is YOLOv8? An Introduction to the YOLOv8 Model Family

Learn about YOLOv8, a family of models for computer vision tasks, and why it offers multiple variants

Medium · Data Science

Mistral's 8B Robostral Navigate outperforms multi-sensor robots

Mistral's 8B Robostral Navigate achieves superior performance with a single RGB camera, outperforming multi-sensor robots

Dev.to · ironbyte-rgb

9-Phase Computer Vision Roadmap 2026 | AI & Deep Learning | #shorts