New Open Audio Models ๐ค | Recap with Jeff
Skills:
LLM Engineering90%
This video covers the latest wave of open audio tooling, from Mistral's Voxtral 4B text-to-speech model to Cohere Transcribe for speech recognition and the Hugging Face infrastructure used to run large-scale transcription workflows. It walks through live demos, browser-based transcription with Transformers.js, and a practical UV-script pipeline built on storage buckets, HF Mount, and HF Jobs. If you're building speech apps or batch transcription systems, this is a fast overview of the current open stack.
---
Demo Links
๐ Voxtral TTS: https://huggingface.co/spaces/mistralai/voxtral-tts-demo
๐ Cohere Transcribe: https://huggingface.co/spaces/CohereLabs/Cohere-Transcribe-WebGPU
๐ UV scripts for transcription: https://huggingface.co/datasets/uv-scripts/transcription
---
๐ค Topics Covered
- Voxtral 4B text-to-speech
- Cohere Transcribe speech-to-text
- Hugging Face audio pipelines
---
โฑ๏ธ Timestamps
0:00 Open audio models and demos
2:44 What Hugging Face storage buckets are
3:39 How HF Mount works
4:02 HF Jobs and wrap-up
Watch on YouTube โ
(saves to browser)
Sign in to unlock AI tutor explanation ยท โก30
More on: LLM Engineering
View skill โRelated AI Lessons
โก
โก
โก
โก
Radiomics in Medical Imaging: Unlocking Hidden Patterns for Early Disease Detection
Medium ยท Machine Learning
Generative AI From First Principles โ Article 5 (Recurrent Neural Networks)
Medium ยท Machine Learning
Generative AI From First Principles โ Article 5 (Recurrent Neural Networks)
Medium ยท Deep Learning
Why Data Quality is Becoming More Important Than Model Size in Modern AI Systems
Dev.to ยท Vishal Uttam Mane
Chapters (4)
Open audio models and demos
2:44
What Hugging Face storage buckets are
3:39
How HF Mount works
4:02
HF Jobs and wrap-up
๐
Tutor Explanation
DeepCamp AI