Large-scale machine learning training, in particular, distributed stochastic gradient descent (SGD), needs to be robust to inherent system variability such as unpredictable computation and communication delays. This work considers a distributed SGD framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. Our goal is to analyze and improve the true speed of error convergence with respect to wall-clock time (instead of the number of iterations).
There are currently no upcoming events scheduled. Please check back soon for more details.
Machine learning today bears resemblance to the field of aviation soon after the Wright Brothers’ pioneering flights in the early 1900s. It took half a century of aeronautical engineering advances for the ‘Jet Age’ (i.e., commercial aviation) to become a reality. Similarly, machine learning (ML) is currently experiencing a renaissance, yet fundamental barriers must be overcome to fully unlock the potential of ML-powered technology. In this talk, I describe our work to help democratize ML by tackling barriers related to scalability, privacy, and safety.
Much of the prior work on scheduling algorithms for wireless networks focuses on maximizing throughput. However, for many real-time applications, delays and deadline guarantees on packet delivery can be more important than long-term throughput. In this talk, we consider the problem of scheduling deadline-constrained packets in wireless networks, under a conflict-graph interference model. The objective is to guarantee that at least a certain fraction of packets of each link are delivered within their deadlines, which is referred to as delivery ratio.
In this talk, I will talk about principled ways of solving a classical reinforcement learning (RL) problem and introduce its robust variant.
We will discuss two problems that have different application spaces but share a common mathematical core. These problems combine stochastic approximation, an iterative method for finding the fixed point of a function from noisy observations, and consensus, a general averaging technique for multiple agents to cooperatively solve a distributed problem.