OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

Name: OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code
Uploaded: 2022-09-24T11:11:37Z
Channel: Aleksa Gordić - The AI Epiphany
Description: ❤️ Become The AI Epiphany Patreon ❤️ https://www.patreon.com/theaiepiphany 👨👩👧👦 Join our Discord community 👨👩👧👦 https://discord.gg/peBrCph...

Aleksa Gordić - The AI Epiphany · Beginner ·🧠 Large Language Models ·3y ago

Skills: Multimodal LLMs90%Fine-tuning LLMs80%Prompt Craft70%LLM Engineering70%LLM Foundations60%

❤️ Become The AI Epiphany Patreon ❤️ https://www.patreon.com/theaiepiphany 👨‍👩‍👧‍👦 Join our Discord community 👨‍👩‍👧‍👦 https://discord.gg/peBrCpheKE In this video I cover Whisper, an ASR system from OpenAI's "Robust Speech Recognition via Large-Scale Weak Supervision" paper. Trained on a huge multi-lingual, multi-task weakly supervised dataset it achieves a very high effective robustness and accuracy closing the gap with the human baseline using only an off-the-shelf transformer. I walk you through both the paper as well as the actual code. Let me know whether the code part helped! ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ ✅ Paper: https://cdn.openai.com/papers/whisper.pdf ✅ Code: https://github.com/openai/whisper ✅ Nice explanation of mel spectrograms: https://www.youtube.com/watch?v=9GHCiiDLHQ4 ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ ⌚️ Timetable: 00:00:00 Intro 00:02:05 Paper overview 00:07:30 Collecting a large scale weakly supervised dataset 00:13:55 Evaluation metric issues (WER) 00:16:05 Effective robustness 00:18:40 Scaling laws in progress 00:26:30 Decoding is hacky 00:28:30 Code walk-through 00:30:25 Model architecture (diagram vs code) 00:33:30 Transcription task 00:34:10 Loading the audio, mel spectrograms 00:37:50 Language detection 00:45:00 Transcription task continued 00:47:35 Suppressing token logits 00:52:00 Voice activity detection 00:53:35 Decoding and heuristics 01:01:56 Outro ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ 💰 BECOME A PATREON OF THE AI EPIPHANY ❤️ If these videos, GitHub projects, and blogs help you, consider helping me out by supporting me on Patreon! The AI Epiphany - https://www.patreon.com/theaiepiphany One-time donation - https://www.paypal.com/paypalme/theaiepiphany Huge thank you to these AI Epiphany patreons: Eli Mahler Petar Veličković ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ 💼 LinkedIn - https://www.linkedin.com/in/aleksagordic/ 🐦 Twitter - https://twitter.com/gordic_aleksa 👨‍👩‍👧‍👦 Discord - https://discord.gg/peBrCpheKE 📺 YouTube - https://www.youtub

Watch on YouTube ↗ (saves to browser)

Playlist

Uploads from Aleksa Gordić - The AI Epiphany · Aleksa Gordić - The AI Epiphany · 0 of 60

← Previous Next →

OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

Playlist

More on: Multimodal LLMs

Related AI Lessons

Chapters (17)

Lesson complete!