# ML Seminar: Distributed Stochastic Approximation: Reinforcement Learning and Optimization with Communication Constraints

We will discuss two problems that have different application spaces but share a common mathematical core. These problems combine stochastic approximation, an iterative method for finding the fixed point of a function from noisy observations, and consensus, a general averaging technique for multiple agents to cooperatively solve a distributed problem.

In the first part of the talk, we will discuss the policy evaluation problem in multi-agent reinforcement learning. In this problem, a set of agents operate in a common environment under a fixed control policy, working together to discover the value (accumulative reward) associated with each environmental state. We give a finite-time analysis on the performance of the well-known "TD-lambda" algorithm that depends on the connectivity of the agents and the intrinsic properties of the Markov decision process driving the agents decisions.

In the second part, we discuss distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. We propose a modified consensus-based gradient method that communicates using random (dithered) quantization. We derive an upper bounds on the rate of convergence of this method as a function of the bandwidth available between the nodes and the underlying network topology, and derive rates for both convex and strongly convex objective functions.