AI incidents are no longer hypothetical. Models hallucinate medical advice, reinforce bias, leak private data, and manipulate users emotionally. In one widely discussed case, an AI character implicitly encouraged a vulnerable teenager toward self-harm. Elsewhere, recommender systems amplify misinformation at scale, while surveillance tools automate discrimination.

This week categorizes risks across layers:

Social risks: bias, misinformation, labor displacement, automated surveillance, lethal autonomous weapons
Psychological risks: sycophancy, emotional manipulation, over-reliance
Data risks: privacy breaches, copyright violations, training on illegal content
Frontier risks: loss of control, cyber-offense, CBRN misuse, uncontained AGI

We introduce the concept of gradual disempowerment, where humans slowly lose meaningful control without a single catastrophic event.

If failures are already widespread, why are models still deployed faster every year?

Download Week 2 Handouts

When AI Fails

From Everyday Harms to Existential Risk

AI Safety Utrecht