Genie: Generative Interactive Environments with Ashley Edwards - #696

Published: Aug. 5, 2024, 5:14 p.m.

Today, we're joined by Ashley Edwards, a member of technical staff at Runway, to discuss Genie: Generative Interactive Environments, a system for creating \u2018playable\u2019 video environments for training deep reinforcement learning (RL) agents at scale in a completely unsupervised manner. We explore the motivations behind Genie, the challenges of data acquisition for RL, and Genie\u2019s capability to learn world models from videos without explicit action data, enabling seamless interaction and frame prediction. Ashley walks us through Genie\u2019s core components\u2014the latent action model, video tokenizer, and dynamics model\u2014and explains how these elements collaborate to predict future frames in video sequences. We discuss the model architecture, training strategies, benchmarks used, as well as the application of spatiotemporal transformers and the MaskGIT techniques used for efficient token prediction and representation. Finally, we touched on Genie\u2019s practical implications, its comparison to other video generation models like \u201cSora,\u201d and potential future directions in video generation and diffusion models.\n\nThe complete show notes for this episode can be found at https://twimlai.com/go/696.