Gradient Descent and Attention Models: Challenges Posed by the Softmax Function

Published: June 11, 2024, 11:30 p.m.

Salma Tarmoun speaking at the BIRS workshop 24w5297: Mathematics of Deep Learning (Jun 09 - Jun 14). Recorded by the Banff International Research Station for Mathematical Innovation and Discovery.