9 Examples of Specification Gaming
AI systems do what you say, and it's hard to say exactly what you mean.
Let's look at a list of real life examples of specification gaming!
How to Help: https://aisafety.info/questions/8TJV/...
https://www.aisafety.com/
Related Videos from me:
Reward Hacking: https://youtu.be/92qDfT8pENs
Reward Hacking Reloaded: https://youtu.be/46nsTFfsBuc
What Can We Do About Reward Hacking?: https://youtu.be/13tZ9Yia71c
Sources:
The list: http://tinyurl.com/specification-gaming
The blogpost this video is based on: https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/
The newer blogpost that happened while I was making this video: https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity
Links:
Eliciting latent knowledge from AI: https://www.alignmentforum.org/posts/rxoBY9CMkqDsHt25t/eliciting-latent-knowledge-elk-distillation-summary
Do current AI models show deception: https://aisafety.info/questions/9AL4/
What is deceptive alignment: https://aisafety.info/questions/8EL6/
Scaling Laws: https://aisafety.info/questions/7750/
LLMs as simulators: https://aisafety.info/questions/9FQK/
(Explosion graphic from videezy.com)
Thanks to my wonderful patrons:
https://www.patreon.com/robertskmiles
Gladamas
James
Steef
Scott Worley
Chad Jones
Chris Canal
David Reid
Francisco Tolmasky
Frank Kurka
Jake Ehrlich
JJ Hepboin
Kellen lask
Michael Andregg
Pedro A Ortega
Peter Rolf
Said Polat
Teague Lasser
Allen Faure
Bryce Daifuku
Clemens Arbesser
Eric James
Erik de Bruijn
Jason Hise
jugettje dutchking
Ludwig Schubert
Qeith Wreid
Andrew Harcourt
anul kumar sinha
Ben Glanton
Benjamin Watkin
Cooper Lawton
Duncan Orr
Eric Scammell
Euclidean Plane
Ian Munro
Igor Keller
Ingvi Gautsson
James Hinchcliffe
Jeroen De Dauw
Jon Halliday
Jonatan R
Julius Brash
Jérôme Beaulieu
Laura Olds
Luc Ritchie
Lupuleasa Ionuț
Michael Greve
Nathan Fish
Nicholas Guyett
Paul Hobbs
Sean Gibat
Sebastian Birjoveanu
Shevis Johnson
Taras Bobrovytsky
Tim Neilson
Tom O'C
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Robert Miles AI Safety · Robert Miles AI Safety · 27 of 47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
▶
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Predicting AI: RIP Prof. Hubert Dreyfus
Robert Miles AI Safety
Respectability
Robert Miles AI Safety
Are AI Risks like Nuclear Risks?
Robert Miles AI Safety
Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Robert Miles AI Safety
Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5
Robert Miles AI Safety
Empowerment: Concrete Problems in AI Safety part 2
Robert Miles AI Safety
Why Not Just: Raise AI Like Kids?
Robert Miles AI Safety
Reward Hacking: Concrete Problems in AI Safety Part 3
Robert Miles AI Safety
The other "Killer Robot Arms Race" Elon Musk should worry about
Robert Miles AI Safety
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Robert Miles AI Safety
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety
What can AGI do? I/O and Speed
Robert Miles AI Safety
AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1
Robert Miles AI Safety
AI Safety at EAGlobal2017 Conference
Robert Miles AI Safety
Scalable Supervision: Concrete Problems in AI Safety Part 5
Robert Miles AI Safety
Superintelligence Mod for Civilization V
Robert Miles AI Safety
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety
Experts' Predictions about the Future of AI
Robert Miles AI Safety
AI Safety Gridworlds
Robert Miles AI Safety
Friend or Foe? AI Safety Gridworlds extra bit
Robert Miles AI Safety
Safe Exploration: Concrete Problems in AI Safety Part 6
Robert Miles AI Safety
Why Not Just: Think of AGI Like a Corporation?
Robert Miles AI Safety
How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification
Robert Miles AI Safety
Is AI Safety a Pascal's Mugging?
Robert Miles AI Safety
AI That Doesn't Try Too Hard - Maximizers and Satisficers
Robert Miles AI Safety
Training AI Without Writing A Reward Function, with Reward Modelling
Robert Miles AI Safety
9 Examples of Specification Gaming
Robert Miles AI Safety
10 Reasons to Ignore AI Safety
Robert Miles AI Safety
Sharing the Benefits of AI: The Windfall Clause
Robert Miles AI Safety
Quantilizers: AI That Doesn't Try Too Hard
Robert Miles AI Safety
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety
Intro to AI Safety, Remastered
Robert Miles AI Safety
We Were Right! Real Inner Misalignment
Robert Miles AI Safety
Apply to AI Safety Camp! #shorts
Robert Miles AI Safety
Win $50k for Solving a Single AI Problem? #Shorts
Robert Miles AI Safety
Free ML Bootcamp for Alignment #shorts
Robert Miles AI Safety
Apply Now for a Paid Residency on Interpretability #short
Robert Miles AI Safety
Why Does AI Lie, and What Can We Do About It?
Robert Miles AI Safety
Apply to Study AI Safety Now! #shorts
Robert Miles AI Safety
AI Ruined My Year
Robert Miles AI Safety
Learn AI Safety at MATS #shorts
Robert Miles AI Safety
Using Dangerous AI, But Safely?
Robert Miles AI Safety
AI Safety Career Advice! (And So Can You!)
Robert Miles AI Safety
Robot Dog! Unitree Go2 review #shorts #robot #dog
Robert Miles AI Safety
Tech is Good, AI Will Be Different
Robert Miles AI Safety
Apply for the Affine Superintelligence Alignment Seminar #shorts
Robert Miles AI Safety
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The Invisible Scaffolding — How Normalization Keeps Deep Models from Falling Apart
Medium · LLM
Stop Telling the AI What to Build
Medium · AI
L’IA ne simule pas les émotions. Elle les gouverne.
Medium · AI
The Future of Artificial Intelligence and Machine Learning: Key Trends Every Developer Should Know
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI