Broadcasts.com - "29 - Science of Deep Learning with Vikrant Varma" (AXRP

Technology
SEE MORE
- classical
- general
- talk
- News
- Family
- Bürgerfunk
- pop
- Islam
- soul
- jazz
- Comedy
- humor
- wissenschaft
- opera
- baroque
- gesellschaft
- theater
- Local
- alternative
- electro
- rock
- rap
- lifestyle
- Music
- como
- RNE
- ballads
- greek
- Buddhism
- deportes
- christian
- piano
- djs
- Dance
- dutch
- flamenco
- social
- hope
- christian rock
- academia
- afrique
- Business
- musique
- ελληνική-μουσική
- religion
- World radio
- Zarzuela
- travel
- World
- NFL
- media
- Art
- public
- Sports
- Gospel
- st.
- baptist
- Leisure
- Kids & Family
- musical
- club
- Culture
- Health & Fitness
- True Crime
- Fiction
- children
- Society & Culture
- TV & Film
- gold
- kunst
- música
- gay
- Natural
- a
- francais
- bach
- economics
- kultur
- evangelical
- tech
- Opinion
- Government
- gaming
- College
- technik
- History
- Jesus
- Health
- movies
- radio
- services
- Church
- podcast
- Education
- international
- Transportation
- Other
- kids
- podcasts
- philadelphia
- Noticias
- love
- sport
- Salud
- film
- and
- 4chan
- Disco
- Stories
- fashion
- Arts
- interviews
- hardstyle
- entertainment
- humour
- medieval
- literature
- alma
- Cultura
- video
- TV
- Science
- en

29 - Science of Deep Learning with Vikrant Varma

Published: April 25, 2024, 6:36 p.m.

In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes. What's going on with these discoveries? Are they all they're cracked up to be, and if so, how are they working? In this episode, I talk to Vikrant Varma about his research getting to the bottom of these questions.

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

\xa0

Topics we discuss, and timestamps:

0:00:36 - Challenges with unsupervised LLM knowledge discovery, aka contra CCS

\xa0 0:00:36 - What is CCS?

\xa0 0:09:54 - Consistent and contrastive features other than model beliefs

\xa0 0:20:34 - Understanding the banana/shed mystery

\xa0 0:41:59 - Future CCS-like approaches

\xa0 0:53:29 - CCS as principal component analysis

0:56:21 - Explaining grokking through circuit efficiency

\xa0 0:57:44 - Why research science of deep learning?

\xa0 1:12:07 - Summary of the paper's hypothesis

\xa0 1:14:05 - What are 'circuits'?

\xa0 1:20:48 - The role of complexity

\xa0 1:24:07 - Many kinds of circuits

\xa0 1:28:10 - How circuits are learned

\xa0 1:38:24 - Semi-grokking and ungrokking

\xa0 1:50:53 - Generalizing the results

1:58:51 - Vikrant's research approach

2:06:36 - The DeepMind alignment team

2:09:06 - Follow-up work

\xa0

The transcript: axrp.net/episode/2024/04/25/episode-29-science-of-deep-learning-vikrant-varma.html

Vikrant's Twitter/X account: twitter.com/vikrantvarma_

\xa0

Main papers:

\xa0- Challenges with unsupervised LLM knowledge discovery: arxiv.org/abs/2312.10029

\xa0- Explaining grokking through circuit efficiency: arxiv.org/abs/2309.02390

\xa0

Other works discussed:

\xa0- Discovering latent knowledge in language models without supervision (CCS): arxiv.org/abs/2212.03827

- Eliciting Latent Knowledge: How to Tell if your Eyes Deceive You:\xa0https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit

- Discussion: Challenges with unsupervised LLM knowledge discovery:\xa0lesswrong.com/posts/wtfvbsYjNHYYBmT3k/discussion-challenges-with-unsupervised-llm-knowledge-1

- Comment thread on the banana/shed results:\xa0lesswrong.com/posts/wtfvbsYjNHYYBmT3k/discussion-challenges-with-unsupervised-llm-knowledge-1?commentId=hPZfgA3BdXieNfFuY

- Fabien Roger, What discovering latent knowledge did and did not find:\xa0lesswrong.com/posts/bWxNPMy5MhPnQTzKz/what-discovering-latent-knowledge-did-and-did-not-find-4

- Scott Emmons, Contrast Pairs Drive the Performance of Contrast Consistent Search (CCS):\xa0lesswrong.com/posts/9vwekjD6xyuePX7Zr/contrast-pairs-drive-the-empirical-performance-of-contrast

- Grokking: Generalizing Beyond Overfitting on Small Algorithmic Datasets:\xa0arxiv.org/abs/2201.02177

- Keeping Neural Networks Simple by Minimizing the Minimum Description Length of the Weights (Hinton 1993 L2):\xa0dl.acm.org/doi/pdf/10.1145/168304.168306

- Progress measures for grokking via mechanistic interpretability:\xa0arxiv.org/abs/2301.0521

\xa0

Episode art by Hamish Doodles:\xa0hamishdoodles.com