12 - AI Existential Risk with Paul Christiano

Published: Dec. 2, 2021, 2:37 a.m.

Why would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christiano about his views of how AI could be so dangerous, what bad AI scenarios could look like, and what he thinks about various techniques to reduce this risk.

\xa0

Topics we discuss, and timestamps:

\xa0- 00:00:38 - How AI may pose an existential threat

\xa0\xa0 - 00:13:36 - AI timelines

\xa0\xa0 - 00:24:49 - Why we might build risky AI

\xa0\xa0 - 00:33:58 - Takeoff speeds

\xa0\xa0 - 00:51:33 - Why AI could have bad motivations

\xa0\xa0 - 00:56:33 - Lessons from our current world

\xa0\xa0 - 01:08:23 - "Superintelligence"

\xa0- 01:15:21 - Technical causes of AI x-risk

\xa0\xa0 - 01:19:32 - Intent alignment

\xa0\xa0 - 01:33:52 - Outer and inner alignment

\xa0\xa0 - 01:43:45 - Thoughts on agent foundations

\xa0- 01:49:35 - Possible technical solutions to AI x-risk

\xa0\xa0 - 01:49:35 - Imitation learning, inverse reinforcement learning, and ease of evaluation

\xa0\xa0 - 02:00:34 - Paul's favorite outer alignment solutions

\xa0\xa0\xa0\xa0 - 02:01:20 - Solutions researched by others

\xa0\xa0\xa0\xa0 - 2:06:13 - Decoupling planning from knowledge

\xa0\xa0 - 02:17:18 - Factored cognition

\xa0\xa0 - 02:25:34 - Possible solutions to inner alignment

\xa0- 02:31:56 - About Paul

\xa0\xa0 - 02:31:56 - Paul's research style

\xa0\xa0 - 02:36:36 - Disagreements and uncertainties

\xa0\xa0 - 02:46:08 - Some favorite organizations

\xa0\xa0 - 02:48:21 - Following Paul's work

\xa0

The transcript:\xa0axrp.net/episode/2021/12/02/episode-12-ai-xrisk-paul-christiano.html

\xa0

Paul's blog posts on AI alignment: ai-alignment.com

\xa0

Material that we mention:

\xa0- Cold Takes - The Most Important Century: cold-takes.com/most-important-century

\xa0- Open Philanthropy reports on:

\xa0\xa0 - Modeling the human trajectory: openphilanthropy.org/blog/modeling-human-trajectory

\xa0\xa0 -\xa0The computational power of the human brain: openphilanthropy.org/blog/new-report-brain-computation

\xa0\xa0 -\xa0AI timelines (draft): alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines

\xa0\xa0 -\xa0Whether AI could drive explosive economic growth: openphilanthropy.org/blog/report-advanced-ai-drive-explosive-economic-growth

\xa0- Takeoff speeds: sideways-view.com/2018/02/24/takeoff-speeds

\xa0-\xa0Superintelligence: Paths, Dangers, Strategies: en.wikipedia.org/wiki/Superintelligence:_Paths,_Dangers,_Strategies

\xa0- Wei Dai on metaphilosophical competence:

\xa0\xa0 - Two neglected problems in human-AI safety: alignmentforum.org/posts/HTgakSs6JpnogD6c2/two-neglected-problems-in-human-ai-safety

\xa0\xa0 -\xa0The argument from philosophical difficulty: alignmentforum.org/posts/w6d7XBCegc96kz4n3/the-argument-from-philosophical-difficulty

\xa0\xa0 -\xa0Some thoughts on metaphilosophy: alignmentforum.org/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy

\xa0- AI safety via debate: arxiv.org/abs/1805.00899

\xa0-\xa0Iterated distillation and amplification: ai-alignment.com/iterated-distillation-and-amplification-157debfd1616

\xa0- Scalable agent alignment via reward modeling: a research direction: arxiv.org/abs/1811.07871

\xa0-\xa0Learning the prior: alignmentforum.org/posts/SL9mKhgdmDKXmxwE4/learning-the-prior

\xa0-\xa0Imitative generalisation (AKA 'learning the prior'): alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1

\xa0- When is unaligned AI morally valuable?: ai-alignment.com/sympathizing-with-ai-e11a4bf5ef6e