79. Ryan Carey - What does your AI want?

Published: April 14, 2021, 5:44 p.m.

b'

AI safety researchers are increasingly focused on understanding what AI systems want. That may sound like an odd thing to care about: after all, aren\\u2019t we just programming AIs to want certain things by providing them with a loss function, or a number to optimize?

\\n

Well, not necessarily. It turns out that AI systems can have incentives that aren\\u2019t necessarily obvious based on their initial programming. Twitter, for example, runs a recommender system whose job is nominally to figure out what tweets you\\u2019re most likely to engage with. And while that might make you think that it should be optimizing for matching tweets to people, another way Twitter can achieve its goal is by matching people to tweets\\u200a\\u2014\\u200athat is, making people easier to predict, by nudging them towards simplistic and partisan views of the world. Some have argued that\\u2019s a key reason that social media has had such a divisive impact on online political discourse.

\\n

So the incentives of many current AIs already deviate from those of their programmers in important and significant ways\\u200a\\u2014\\u200aways that are literally shaping society. But there\\u2019s a bigger reason they matter: as AI systems continue to develop more capabilities, inconsistencies between their incentives and our own will become more and more important. That\\u2019s why my guest for this episode, Ryan Carey, has focused much of his research on identifying and controlling the incentives of AIs. Ryan is a former medical doctor, now pursuing a PhD in machine learning and doing research on AI safety at Oxford University\\u2019s Future of Humanity Institute.

'