Virtual Seminar: Finite Sample Convergence Bounds of Off-Policy Reinforcement Learning Algorithms

Friday, October 23, 2020

The focus of our work is to obtain finite-sample and/or finite-time convergence bounds of various model-free Reinforcement Learning (RL) algorithms. Many RL algorithms are special cases of Stochastic Approximation (SA), which is a popular approach for solving fixed point equations when the information is corrupted by noise. We first obtain finite-sample bounds for general SA using a generalized Moreau envelope as a smooth potential/ Lyapunov function. We then use this result to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-Learning, and to recover the state-of-the art results for tabular Q-Learning. We also use Lyapunov drift arguments to provide finite time error bounds of Q-learning algorithm with linear function approximation under an assumption on the sampling policy. This talk is based on the following papers: and


Photo: Siva Theja Maguluri
Assistant Professor
Georgia Institute of Technology

Siva Theja Maguluri is the Fouts Family Early Career Professor and an Assistant Professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Tech. Before that, he was a Research Staff Member in the Mathematical Sciences Department at IBM T. J. Watson Research Center. He obtained his Ph.D. and MS in ECE as well as MS in Applied Math from UIUC, and B.Tech in Electrical Engineering from IIT Madras. His research interests span the areas of Control, Optimization, Algorithms and Applied Probability. In particular, he works on Reinforcement Learning theory, scheduling, resource allocation and revenue optimization problems that arise in a variety of systems including Data Centers, Cloud Computing, Wireless Networks, Block Chains, Ride hailing systems, etc. He is a recipient of the biennial “Best Publication in Applied Probability” award in 2017 and the “CTL/BP Junior Faculty Teaching Excellence Award” in 2020.