Multimedia

Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Joint work with Lénaïc Chizat)

 

Francis Bach (Inria-ENS)

Date: Friday, September 18, 2020
Time: 11:00 AM – 12:00 PM (CDT; UTC -5)

Two Facets of Learning Robust Models: Fundamental Limits and Generalization to Natural Out-of-Distribution Inputs
-Prof. Hamed Hassani (Univ. of Pennsylvania)

In this talk, we will focus on the recently-emerged field of (adversarially) robust learning. This field began by the observation that modern learning models, despite the breakthrough performance, remain fragile to seemingly innocuous changes in the data such as small, norm-bounded perturbations of the input data. In response, various training methodologies have been developed for enhancing robustness. However, it is fair to say that our understanding in this field is still at its infancy and several key questions remain widely open. We will consider two such questions.

(1) Fundamental limits: It has been repeatedly observed that improving robustness to perturbed inputs (robust accuracy) comes at the cost of decreasing the accuracy of benign inputs (standard accuracy), leading to a fundamental tradeoff between these often competing objectives. Complicating matters further, recent empirical evidence suggests that a variety of other factors (size and quality of training data, model size, etc.) affect this tradeoff in somewhat surprising ways. In the first part of the talk, we will develop a precise and comprehensive understanding of such tradeoffs in the context of the simple yet foundational problem of linear regression.

(2) Robustness to other types of out-of-distribution inputs: There are other sources of fragility for deep learning that are arguably more common and less studied. Indeed, natural variation such as lighting or weather conditions or device imperfections can significantly degrade the accuracy of trained neural networks, proving that such natural variation presents a significant challenge. To this end, in the second part of the talk, we propose a paradigm shift from perturbation-based adversarial robustness toward a new framework called “model-based robust deep learning”. Using this framework, we will provide general training algorithms that improve the robustness of neural networks against natural variation in data. We will show the success of this framework to improve robustness of modern learning models consistently against many types of natural out-of-distribution inputs and across a variety of commonly-used datasets.

Abstract: In this talk, we aim to quantify the robustness of distributed training against worst-case failures and adversarial nodes. We show that there is a gap between robustness guarantees, depending on whether adversarial nodes have full control of the hardware, the training data, or both. Using ideas from robust statistics and coding theory we establish robust and scalable training methods for centralized, parameter server systems. Perhaps unsurprisingly, we prove that robustness is impossible when a central authority does not own the training data, e.g., in federated learning systems. We then provide a set of attacks that force federated models to exhibit poor performance on either the training, test, or out-of-distribution data sets. Our results and experiments cast doubts on the security presumed by federated learning system providers, and show that if you want robustness, you probably have to give up some of your data privacy.

Bio: Dimitris Papailiopoulos is an Assistant Professor of ECE and CS (by courtesy) at UW-Madison. His research spans machine learning, information theory, and distributed systems, with a current focus on scalable and fault-tolerant distributed machine learning systems. Dimitris was a postdoctoral researcher at UC Berkeley and a member of the AMPLab. He earned his Ph.D. in ECE from UT Austin in 2014, under the supervision of Alex Dimakis. Dimitris is a recipient of the NSF CAREER Award (2019), a Sony Faculty Innovation Award (2019), the Benjamin Smith Reynolds Award for Excellence in Teaching (2019), and the IEEE Signal Processing Society, Young Author Best Paper Award (2015). In 2018, he co-founded MLSys, a new conference that targets research at the intersection of machine learning and systems.

Few-shot classification, the task of adapting a classifier to unseen classes given a small labeled dataset, is an important step on the path toward human-like machine learning. I will present what I think are some of the key advances and open questions in this area. I will then focus on the fundamental issue of overfitting in the few-shot scenario. Bayesian methods are well-suited to tackling this issue because they allow practitioners to specify prior beliefs and update those beliefs in light of observed data. Contemporary approaches to Bayesian few-shot classification maintain a posterior distribution over model parameters, which is slow and requires storage that scales with model size. Instead, we propose a Gaussian process classifier based on a novel combination of Pólya-gamma augmentation and the one-vs-each loss that allows us to efficiently marginalize over functions rather than model parameters. We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.

The seminar was delivered live using Zoom on 5/7/20. 

 

Bio: Richard Zemel is a Professor of Computer Science at the University of Toronto, where he has been a faculty member since 2000. Prior to that, he was an Assistant Professor in Computer Science and Psychology at the University of Arizona and a Postdoctoral Fellow at the Salk Institute and at Carnegie Mellon University. He received a B.Sc. degree in History & Science from Harvard University in 1984 and a Ph.D. in Computer Science from the University of Toronto in 1993. He is also the co-founder of SmartFinance, a financial technology startup specializing in data enrichment and natural language processing.

His awards include an NVIDIA Pioneers of AI Award, a Young Investigator Award from the Office of Naval Research, a Presidential Scholar Award, two NSERC Discovery Accelerators, and seven Dean’s Excellence Awards at the University of Toronto. He is a Fellow of the Canadian Institute for Advanced Research and is on the Executive Board of the Neural Information Processing Society, which runs the premier international machine learning conference.

Pages