The Glitchatorio
30-minute introductions to some of the trickiest issues around AI today, such as:
- The alignment problem
- Questions of LLM consciousness
- AI and animal welfare
- Scheming and hallucinations
The Glitchatorio is a podcast about the aspects of AI that don't fit neatly into marketing messages or notions of technology-as-destiny. We look into the failure modes, emergent mysteries and unexpected behaviors of artificial intelligence that baffle even the experts. You'll hear from technical researchers, data scientists and machine learning experts, as well as psychologists, philosophers and others whose work intersects with AI.
Most Glitchatorio episodes follow the standard podcast interview format. Sometimes these episodes alternate with fictional audio skits or personal voice notes.
The voices, music and audio effects you hear on The Glitchatorio are all recorded or composed by the Witch of Glitch; they are not AI-generated.
The Glitchatorio
Reading the Mind of AI
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
What if we could figure out how large language models really work by getting inside their "heads"?
While possible, it relies on a a controversial technique called mechanistic interpretability ("mech interp" ), also known as the neuroscience of AI.
In this episode, Ihor Kendiukhov — lead researcher at SPAR, a research programme for AI risks — explains mech interp in laypersons terms, and what it's currently able to reveal about the thinking processes of AIs.
We also talk about why mech interp has charted an unusually emotional course in AI research over the past few years, from physicists falling in love with it to prominent AI safety figures throwing shade on it, and where it might be headed next.
Some of the papers and announcements referenced in this episode include:
- Golden Gate Claude: https://www.anthropic.com/news/golden-gate-claude
- And the excited announcement about mind-mapping that preceded it: https://www.anthropic.com/news/golden-gate-claude
- An example of recent criticism of mech interp: https://ai-frontiers.org/articles/the-misguided-quest-for-mechanistic-ai-interpretability