Initiatives

Demystifying AI Interpretability

This talk will attempt to demystify, for a non-technical audience, the current state of neural network explainability and interpretability, as well as trace the boundaries of what is in principle possible to achieve. We will first set up the necessary background to talk about interpretability methods with stakeholders in mind, define basic concepts, and explain differences such as inner interpretability versus explainability. Along the way, we will touch on issues of relevance to various stakeholders; for instance, the role of interpretability in attempting explanations of how large language models generate text, in revealing reasons for model biases, and in model distillation.

Throughout, we will use a particular lens to demystify what AI interpretability is, and which goals are within or out of its reach: instead of focusing on the promises of (algorithmic) solutions for interpretability, we will focus on the properties of the (computational) problems they attempt to solve. This lens—which we call computational meta-theory—will allow us to put stakeholders’ goals at the centre and to reason about the adequacy of interpretability ‘hammers’ to hit practically meaningful ‘nails’.

Federico Adolfi is currently a postdoctoral researcher at the Ernst Strüngmann Institute for Neuroscience, Max Planck Society. He combines a background in cognitive and brain science, computer science, and music. His PhD in Computational Cognitive Science at the University of Bristol focused on establishing a conceptual and formal framework for computational meta-theory and demonstrating its application to problems in psychology, neuroscience, and artificial intelligence. One of these applications is the problem of AI interpretability, for which he and his colleagues recently provided the first formal analyses of the scope and limits of circuit discovery to interpret neural networks.

Generated with Avocode.Gruppe 1629Gruppe 1486Rechteck 475Ellipse 1817Pfad 17330Pfad 17331Pfad 17332Pfad 17333

28 March 2025 | Demystifying AI Interpretability

Find out more about the organizers of this event, the Initiative: Max Planck Law, Tech, Society

More Events