85. Brian Christian - The Alignment Problem

Published: May 26, 2021, 3:09 p.m.

b'

In 2016, OpenAI published a blog describing the results of one of their AI safety experiments. In it, they describe how an AI that was trained to maximize its score in a boat racing game ended up discovering a strange hack: rather than completing the race circuit as fast as it could, the AI learned that it could rack up an essentially unlimited number of bonus points by looping around a series of targets, in a process that required it to ram into obstacles, and even travel in the wrong direction through parts of the circuit.

\\n

This is a great example of the alignment problem: if we\\u2019re not extremely careful, we risk training AIs that find dangerously creative ways to optimize whatever thing we tell them to optimize for. So building safe AIs \\u2014 AIs that are aligned with our values \\u2014 involves finding ways to very clearly and correctly quantify what we want our AIs to do. That may sound like a simple task, but it isn\\u2019t: humans have struggled for centuries to define \\u201cgood\\u201d metrics for things like economic health or human flourishing, with very little success.

\\n

Today\\u2019s episode of the podcast features Brian Christian \\u2014 the bestselling author of several books related to the connection between humanity and computer science & AI. His most recent book, The Alignment Problem, explores the history of alignment research, and the technical and philosophical questions that we\\u2019ll have to answer if we\\u2019re ever going to safely outsource our reasoning to machines. Brian\\u2019s perspective on the alignment problem links together many of the themes we\\u2019ve explored on the podcast so far, from AI bias and ethics to existential risk from AI.

'