Your AI Is ~8× More Expensive in Some Languages — Here's Why
Non-English prompts can explode token counts. In our demo using the cl100k_base tokenizer, the same sentence is English: 15, Spanish: 21 (×1.4), Telugu: 115 (×7.7) — which maps directly to higher API cost. Counts vary by model/tokenizer; here’s why training data and tokenization create a hidden “Token Tax” — and how to plan for it.
This presentation is inspired by the core concepts in the book "AI Engineering" by Chip Huyen. If you want a deeper dive into these topics, I highly recommend checking it out.
Timestamps
00:00 - The ~8x AI Cost Nobody Warns You About
00:50 - Problem: Why AI is an English-First World
01:34 - The AI's Library: Common Crawl
02:32 - The Under-representation Crisis (The Official Numbers)
03:56 - DEMO: Proving the 7.7x "Token Tax"
05:03 - How This Impacts Your Projects
05:58 - What This Means for the Future of AI
06:40 - Your Mission & Next Steps
Connect & Subscribe:
🎓 Join our FREE AI Engineering Community on Discord: https://discord.gg/rQMxdJJC
🔔 Subscribe for our next series:
https://www.youtube.com/@UCf12NnZycD7LB8prrgdTOyg
This presentation is inspired by the core concepts in the book "AI Engineering" by Chip Huyen. If you want a deeper dive into these topics, I highly recommend checking it out.
#artificialintelligence
#ai
#aiengineering
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Foundations
View skill →Related AI Lessons
Chapters (8)
The ~8x AI Cost Nobody Warns You About
0:50
Problem: Why AI is an English-First World
1:34
The AI's Library: Common Crawl
2:32
The Under-representation Crisis (The Official Numbers)
3:56
DEMO: Proving the 7.7x "Token Tax"
5:03
How This Impacts Your Projects
5:58
What This Means for the Future of AI
6:40
Your Mission & Next Steps
🎓
Tutor Explanation
DeepCamp AI