AI scaling has really taken off. Ever since GPT-3 came out, it\u2019s become clear that one of the things we\u2019ll need to do to move beyond narrow AI and towards more generally intelligent systems is going to be to massively scale up the size of our models, the amount of processing power they consume and the amount of data they\u2019re trained on, all at the same time.
\nThat\u2019s led to a huge wave of highly scaled models that are incredibly expensive to train, largely because of their enormous compute budgets. But what if there was a more flexible way to scale AI\u200a\u2014\u200aone that allowed us to decouple model size from compute budgets, so that we can track a more compute-efficient course to scale?
\nThat\u2019s the promise of so-called mixture of experts models, or MoEs. Unlike more traditional transformers, MoEs don\u2019t update all of their parameters on every training pass. Instead, they route inputs intelligently to sub-models called experts, which can each specialize in different tasks. On a given training pass, only those experts have their parameters updated. The result is a sparse model, a more compute-efficient training process, and a new potential path to scale.
\nGoogle has been pushing the frontier of research on MoEs, and my two guests today in particular have been involved in pioneering work on that strategy (among many others!). Liam Fedus and Barrett Zoph are research scientists at Google Brain, and they joined me to talk about AI scaling, sparsity and the present and future of MoE models on this episode of the TDS podcast.
\n***
\nIntro music:
\n- Artist: Ron Gelinas
\n- Track Title: Daybreak Chill Blend (original mix)
\n- Link to Track: https://youtu.be/d8Y2sKIgFWc
\n***
\nChapters:\n