Provably learning a multi-head attention layer

Published: Feb. 27, 2024, 12:18 a.m.

Sitan Chen speaking at the BIRS workshop 24w5214: Computational Complexity of Statistical Inference (Feb 25 - Mar 01). Recorded by the Banff International Research Station for Mathematical Innovation and Discovery.