74. Ethan Perez - Making AI safe through debate

Published: March 10, 2021, 2:46 p.m.

Most AI researchers are confident that we will one day create superintelligent systems\u200a\u2014\u200amachines that can significantly outperform humans across a wide variety of tasks.

\n

If this ends up happening, it will pose some potentially serious problems. Specifically: if a system is superintelligent, how can we maintain control over it? That\u2019s the core of the AI alignment problem\u200a\u2014\u200athe problem of aligning advanced AI systems with human values.

\n

A full solution to the alignment problem will have to involve at least two things. First, we\u2019ll have to know exactly what we want superintelligent systems to do, and make sure they don\u2019t misinterpret us when we ask them to do it (the \u201couter alignment\u201d problem). But second, we\u2019ll have to make sure that those systems are genuinely trying to optimize for what we\u2019ve asked them to do, and that they aren\u2019t trying to deceive us (the \u201cinner alignment\u201d problem).

\n

Creating systems that are inner-aligned and superintelligent might seem like different problems\u200a\u2014\u200aand many think that they are. But in the last few years, AI researchers have been exploring a new family of strategies that some hope will allow us to achieve both superintelligence and inner alignment at the same time. Today\u2019s guest, Ethan Perez, is using these approaches to build language models that he hopes will form an important part of the superintelligent systems of the future. Ethan has done frontier research at Google, Facebook, and MILA, and is now working full-time on developing learning systems with generalization abilities that could one day exceed those of human beings.