ML Seminar: Robust Reinforcement Learning with Langevin Dynamics

Monday, February 03, 2020
3:00pm - 4:00pm
EER 3.646

In this talk, I will talk about principled ways of solving a classical reinforcement learning (RL) problem and introduce its robust variant.

In particular, we rethink the exploration-exploitation trade-off in RL as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics (SGLD), we propose a new RL algorithm, which results in a sampling variant of the Twin Delayed Deep Deterministic Policy Gradient (TD3) method. Our algorithm consistently outperforms existing exploration strategies for TD3 based on heuristic noise injection strategies in several MuJoCo environments.

The sampling perspective enables us to introduce an action-robust variant of RL objective, which is as a particular case of a zero-sum two-player Markov game. In this setting, at each step of the game, both players simultaneously choose an action. The reward each player gets after one step depends on the state and the convex combination of the actions of both players. Based on our earlier work (SGLD for min-max/GAN problem), we propose a new robust RL algorithm with convergence guarantee and provide numerical evidence of the new algorithm. Finally, I will also discuss future directions on the application of the framework to self-play in games.


Photo of Prof. Volkan Cevher
Associate Professor
Swiss Federal Institute of Technology Lausanne

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include signal processing theory, machine learning, convex optimization, and information theory. Dr. Cevher was the recipient of the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.