Reinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to Jon Krohn about the technique\u2019s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education.\n\nThis episode is brought to you by AWS Inferentia (go.aws/3zWS0au) and AWS Trainium (go.aws/3ycV6K0), and Crawlbase (crawlbase.com), the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit\xa0passionfroot.me/superdatascience\xa0for sponsorship information.\n\nIn this episode you will learn:\n\u2022 Why it is important that AI is open [03:13]\n\u2022 The efficacy and scalability of direct preference optimization [07:32]\n\u2022 Robotics and LLMs [14:32]\n\u2022 The challenges to aligning reward models with human preferences [23:00]\n\u2022 How to make sure AI\u2019s decision making on preferences reflect desirable behavior [28:52]\n\u2022 Why Nathan believes AI is closer to alchemy than science [37:38]\n\nAdditional materials:\xa0www.superdatascience.com/791