What if I train a neural network with random data? (with Stanisaw Jastrzebski) (Ep. 87)

Published: Nov. 12, 2019, 6:21 a.m.

b"What happens to a neural network trained with random data? \\nAre massive neural networks just lookup tables or do they truly learn something?\\xa0\\nToday\\u2019s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrz\\u0119bski from New York University.\\nStan spent two summers as a visiting student with Prof. Yoshua Bengio\\xa0and has been working on\\xa0\\nUnderstanding and improving how deep network generalise\\nRepresentation Learning\\nNatural Language Processing\\nComputer Aided Drug Design\\n\\xa0\\nWhat makes deep learning unique?\\nI have asked him a few questions for which I was looking for an answer for a long time. For instance, what is deep learning bringing to the table that other methods don\\u2019t or are not capable of?\\xa0Stan believe that the one thing that makes deep learning special is representation learning. All the other competing methods, be it kernel machines, or random forests, do not have this capability. Moreover, optimisation (SGD) lies at the heart of representation learning in the sense that it allows finding good representations.\\xa0\\n\\xa0\\nWhat really improves the training quality of a neural network?\\nWe discussed about the accuracy of neural networks depending pretty much on how good the Stochastic Gradient Descent method is at finding minima of the loss function. What would influence such minima?Stan's answer has revealed that training set accuracy or loss value is not that interesting actually. It is relatively easy to overfit data (i.e. achieve the lowest loss possible), provided a large enough network, and a large enough computational budget. However, shape of the minima, or performance on validation sets are in a quite fascinating way influenced by optimisation. Optimisation in the beginning of the trajectory, steers such trajectory towards minima of certain properties that go much further than just training accuracy.\\nAs always we spoke about the future of AI and the role deep learning will play.\\nI hope you enjoy the show!\\nDon't forget to join the conversation on our new Discord channel. See you there!\\n\\xa0\\nReferences\\n\\xa0\\nHomepage of\\xa0Stanis\\u0142aw Jastrz\\u0119bski\\xa0https://kudkudak.github.io/\\nA Closer Look at Memorization in Deep Networks\\xa0https://arxiv.org/abs/1706.05394\\nThree Factors Influencing Minima in SGD\\xa0https://arxiv.org/abs/1711.04623\\nDon't Decay the Learning Rate, Increase the Batch Size\\xa0https://arxiv.org/abs/1711.00489\\nStiffness: A New Perspective on Generalization in Neural Networks\\xa0https://arxiv.org/abs/1901.09491"