Abstract: In this talk, we aim to quantify the robustness of distributed training against worst-case failures and adversarial nodes. We show that there is a gap between robustness guarantees, depending on whether adversarial nodes have full control of the hardware, the training data, or both. Using ideas from robust statistics and coding theory we establish robust and scalable training methods for centralized, parameter server systems. Perhaps unsurprisingly, we prove that robustness is impossible when a central authority does not own the training data, e.g., in federated learning systems. We then provide a set of attacks that force federated models to exhibit poor performance on either the training, test, or out-of-distribution data sets. Our results and experiments cast doubts on the security presumed by federated learning system providers, and show that if you want robustness, you probably have to give up some of your data privacy.

Bio: Dimitris Papailiopoulos is an Assistant Professor of ECE and CS (by courtesy) at UW-Madison. His research spans machine learning, information theory, and distributed systems, with a current focus on scalable and fault-tolerant distributed machine learning systems. Dimitris was a postdoctoral researcher at UC Berkeley and a member of the AMPLab. He earned his Ph.D. in ECE from UT Austin in 2014, under the supervision of Alex Dimakis. Dimitris is a recipient of the NSF CAREER Award (2019), a Sony Faculty Innovation Award (2019), the Benjamin Smith Reynolds Award for Excellence in Teaching (2019), and the IEEE Signal Processing Society, Young Author Best Paper Award (2015). In 2018, he co-founded MLSys, a new conference that targets research at the intersection of machine learning and systems.

Few-shot classification, the task of adapting a classifier to unseen classes given a small labeled dataset, is an important step on the path toward human-like machine learning. I will present what I think are some of the key advances and open questions in this area. I will then focus on the fundamental issue of overfitting in the few-shot scenario. Bayesian methods are well-suited to tackling this issue because they allow practitioners to specify prior beliefs and update those beliefs in light of observed data. Contemporary approaches to Bayesian few-shot classification maintain a posterior distribution over model parameters, which is slow and requires storage that scales with model size. Instead, we propose a Gaussian process classifier based on a novel combination of Pólya-gamma augmentation and the one-vs-each loss that allows us to efficiently marginalize over functions rather than model parameters. We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.

The seminar was delivered live using Zoom on 5/7/20. 


Bio: Richard Zemel is a Professor of Computer Science at the University of Toronto, where he has been a faculty member since 2000. Prior to that, he was an Assistant Professor in Computer Science and Psychology at the University of Arizona and a Postdoctoral Fellow at the Salk Institute and at Carnegie Mellon University. He received a B.Sc. degree in History & Science from Harvard University in 1984 and a Ph.D. in Computer Science from the University of Toronto in 1993. He is also the co-founder of SmartFinance, a financial technology startup specializing in data enrichment and natural language processing.

His awards include an NVIDIA Pioneers of AI Award, a Young Investigator Award from the Office of Naval Research, a Presidential Scholar Award, two NSERC Discovery Accelerators, and seven Dean’s Excellence Awards at the University of Toronto. He is a Fellow of the Canadian Institute for Advanced Research and is on the Executive Board of the Neural Information Processing Society, which runs the premier international machine learning conference.

Abstract: Many machine learning tasks can be posed as structured prediction, where the goal is to predict a labeling or structured object. For example, the input may be an image or a sentence, and the output is a labeling such as an assignment of each pixel in the image to foreground or background, or the parse tree for the sentence. Despite marginal and MAP inference for many of these models being NP-hard in the worst-case, approximate inference algorithms are remarkably successful and as a result structured prediction is widely used.

What makes these real-world instances different from worst-case instances? One key difference is that in all of these applications, there is an underlying "ground truth" which structured prediction is aiming to find. In this talk, I will introduce a new theoretical framework for analyzing structured prediction algorithms in terms of their ability to achieve small Hamming error. We study the computational and statistical trade-offs that arise in this setting, and illustrate a setting where polynomial-time algorithms can perform optimal prediction, despite the corresponding MAP inference task being NP-hard.

Based on joint work with Amir Globerson, Ofer Meshi, Tim Roughgarden, and Cafer Yildirim.

Speaker Bio: David Sontag is an Assistant Professor of Computer Science and Data Science at NYU. Computer Science is part of the Courant Institute of Mathematical Sciences. His research focuses on machine learning and probabilistic inference, with a particular focus on applications to clinical medicine. For example, he is developing algorithms to learn probabilistic models for medical diagnosis directly from unstructured clinical data, automatically discovering and predicting latent (hidden) variables. Prof. Sontag collaborates with the Emergency Medicine Informatics Research Lab at Beth Israel Deaconess Medical Center and with Independence Blue Cross.

Previously, he was a post-doc at Microsoft Research New England. His Ph.D. is in Computer Science from MIT, where he worked with Tommi Jaakkola on approximate inference and learning in probabilistic models. Prof. Sontag received a bachelors degree from UC Berkeley in Computer Science, where he worked with Stuart Russell's First-Order Probabilistic Logic group.

Abstract: Submodular functions capture a wide spectrum of discrete problems in machine learning, signal processing and computer vision. They are characterized by intuitive notions of diminishing returns and economies of scale, and often lead to practical algorithms with theoretical guarantees.

In the first part of this talk, I will give a general introduction to the concept of submodular functions, their optimization and example applications in machine learning.

In the second part, I will demonstrate how the close connection of submodularity to convexity leads to fast algorithms for minimizing a subclass of submodular functions - those decomposing as a sum of submodular functions. Using a specific relaxation, the algorithms solve the discrete submodular optimization problem as a "best approximation" problem. They are easy to use and parallelize, and solve both the convex relaxation and the original discrete problem. Their convergence analysis combines elements of geometry and spectral graph theory.

This is joint work with Robert Nishihara, Francis Bach, Suvrit Sra and Michael I. Jordan.

Speaker Bio: Stefanie Jegelka is the X-Consortium Career Development Assistant Professor in the Department of EECS at MIT, and a member of CSAIL and the Institute for Data, Systems and Society. Before joining MIT, she was a postdoctoral scholar in the AMPLab at UC Berkeley. She earned her PhD from ETH Zurich in collaboration with the Max Planck Institutes in Tuebingen, Germany, and a Diplom from the University of Tuebingen. Her research interests lie in algorithmic and combinatorial machine learning, with applications in computer vision, materials science, and biology. She has been a fellow of the German National Academic Foundation, and has received several other fellowships, as well as a Best Paper Award at ICML and the 2015 Award of the German Society for Pattern Recognition.