Broadcasts.com - "791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert" (Super Data Science: ML & AI Podcast with Jon Krohn)

Technology
SEE MORE
- classical
- general
- talk
- News
- Family
- Bürgerfunk
- pop
- Islam
- soul
- jazz
- Comedy
- humor
- wissenschaft
- opera
- baroque
- gesellschaft
- theater
- Local
- alternative
- electro
- rock
- rap
- lifestyle
- Music
- como
- RNE
- ballads
- greek
- Buddhism
- deportes
- christian
- piano
- djs
- Dance
- dutch
- flamenco
- social
- hope
- christian rock
- academia
- afrique
- Business
- musique
- ελληνική-μουσική
- religion
- World radio
- Zarzuela
- travel
- World
- NFL
- media
- Art
- public
- Sports
- Gospel
- st.
- baptist
- Leisure
- Kids & Family
- musical
- club
- Culture
- Health & Fitness
- True Crime
- Fiction
- children
- Society & Culture
- TV & Film
- gold
- kunst
- música
- gay
- Natural
- a
- francais
- bach
- economics
- kultur
- evangelical
- tech
- Opinion
- Government
- gaming
- College
- technik
- History
- Jesus
- Health
- movies
- radio
- services
- Church
- podcast
- Education
- international
- Transportation
- Other
- kids
- podcasts
- philadelphia
- Noticias
- love
- sport
- Salud
- film
- and
- 4chan
- Disco
- Stories
- fashion
- Arts
- interviews
- hardstyle
- entertainment
- humour
- medieval
- literature
- alma
- Cultura
- video
- TV
- Science
- en

791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert

Published: June 11, 2024, 11 a.m.

Reinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to Jon Krohn about the technique\u2019s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education.\n\nThis episode is brought to you by AWS Inferentia (go.aws/3zWS0au) and AWS Trainium (go.aws/3ycV6K0), and Crawlbase (crawlbase.com), the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit\xa0passionfroot.me/superdatascience\xa0for sponsorship information.\n\nIn this episode you will learn:\n\u2022 Why it is important that AI is open [03:13]\n\u2022 The efficacy and scalability of direct preference optimization [07:32]\n\u2022 Robotics and LLMs [14:32]\n\u2022 The challenges to aligning reward models with human preferences [23:00]\n\u2022 How to make sure AI\u2019s decision making on preferences reflect desirable behavior [28:52]\n\u2022 Why Nathan believes AI is closer to alchemy than science [37:38]\n\nAdditional materials:\xa0www.superdatascience.com/791