WNCG Seminar Series: Scalable and User-Friendly Machine Learning in Apache Spark

Friday, May 22, 2015
UTA 7.532

Modern datasets are rapidly growing in size and complexity, and this wealth of data holds the promise for many transformational applications. Machine learning is seemingly poised to deliver on this promise, having proposed and rigorously evaluated a wide range of data processing techniques over the past several decades. However, concerns over scalability and usability present major roadblocks to the wider adoption of these methods. In this talk I will describe the MLbase project, which aims to address these concerns by developing machine learning functionality on top of Apache Spark, a popular cluster computing engine designed for iterative computation. I will first describe MLlib, Spark’s scalable machine learning library that grew out of the MLbase project. I will also discuss higher level components of MLbase, focusing on the problem of hyperparameter optimization as a means to simplify the task of machine learning pipeline construction.


Assistant Professor
University of California-Los Angeles

Ameet Talwalkar is an assistant professor of Computer Science at UCLA and a technical advisor for Databricks. His research addresses scalability and ease-of-use issues in the field of statistical machine learning, with applications in computational genomics. He led the initial development of the MLlib project inApache Spark and is a co-author of the graduate-level textbook 'Foundations of Machine Learning' (2012, MIT Press). Prior to UCLA, he was an NSF post-doctoral fellow in the AMPLab at UC Berkeley. He obtained a B.S. from Yale University and a Ph.D. from the Courant Institute at NYU.