# Virtual Seminar: Finite Sample Convergence Bounds of Off-Policy Reinforcement Learning Algorithms

The focus of our work is to obtain finite-sample and/or finite-time convergence bounds of various model-free Reinforcement Learning (RL) algorithms. Many RL algorithms are special cases of Stochastic Approximation (SA), which is a popular approach for solving fixed point equations when the information is corrupted by noise. We first obtain finite-sample bounds for general SA using a generalized Moreau envelope as a smooth potential/ Lyapunov function. We then use this result to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-Learning, and to recover the state-of-the art results for tabular Q-Learning. We also use Lyapunov drift arguments to provide finite time error bounds of Q-learning algorithm with linear function approximation under an assumption on the sampling policy. This talk is based on the following papers: https://arxiv.org/abs/2002.00874 and https://arxiv.org/abs/1905.11425