The Glitchatorio
Join the Witch of Glitch for investigations into the failure modes, emergent mysteries and unexpected behaviors of artificial intelligence that are unfolding in our world today. You'll hear from technical researchers, data scientists and machine learning experts, as well as psychologists, philosophers and others whose work intersects with AI.
Most Glitchatorio episodes follow the standard podcast interview format. Sometimes these episodes alternate with fictional audio skits or personal voice notes.
The voices, music and audio effects you hear on The Glitchatorio are all recorded or composed by the Witch of Glitch; they are not AI-generated.
The Glitchatorio
Dreaming or Scheming?
Since AI models are built on artificial neural networks, what parallels can we draw with human brain "wiring"?
In this episode, we hear from neuroscientist and psychology researcher Scott Blain about the pitfalls of pattern recognition, as well as that grey area where agreeableness shades into sycophancy.
We then dig into the big unknowns about AI self-awareness and its capacity to deceive or manipulate humans. For more details about the "blackmail experiment" we discuss, see the paper from Anthropic called "Agentic Misalignment: How LLMs could be insider threats". Finally, if you're not familiar with the following terms, here are some quick definitions:
- RLHF - reinforcement learning from human feedback. This is a technique where people "teach" AI models which answers it should provide by rewarding the correct ones.
- Mechanistic interpretability - a kind of reverse engineering of AI models that seeks to understand their outputs by investigating the activity of their neural networks.