Towards demystifying over-parameterization and early stopping in deep learning

Seminar
Friday, May 03, 2019
11:00am - 12:00pm
EER 1.516 (North Tower)

Many modern neural networks are trained in an over-parameterized regime where the parameters of the model exceed the size of the training dataset. Due to their over-parameterized nature these models in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, models trained via first-order methods (often with early stopping) continue to predict well on yet unseen test data. In this talk I will discuss some results aimed at demystifying such phenomena by demonstrating that gradient methods enjoy a few intriguing properties: (1) when initialized at random the iterates converge at a geometric rate to a global optima, (2) among all global optima of the loss the iterates converge to one with good generalization capability, (3) with early-stopping are provably robust to noise/corruption/shuffling on a fraction of the labels with these algorithms only fitting to the correct labels and ignoring the corrupted labels.

Speaker

Assistant Professor
University of Southern California

Mahdi Soltanolkotabi is an assistant professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is the recipient of the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), and a Google faculty research award.