Virtual Seminar: Reinforcement Learning using Generative Models for Continuous State and Action Space Systems

Friday, October 09, 2020

Meeting Time: 11:00 AM – 12:00 PM Central (CDT; UTC -5)

Reinforcement Learning (RL) problems for continuous state and action space systems are among the most challenging in RL. Recently, deep reinforcement learning methods have been shown to be quite effective for certain RL problems in settings of very large/continuous state and action spaces. But such methods require extensive hyper-parameter tuning, huge amount of data, and come with no performance guarantees. We note that such methods are mostly trained `offline’ on experience replay buffers. In this talk, I will describe a series of simple reinforcement learning schemes for various settings. Our premise is that we have access to a generative model that can give us simulated samples of the next state. We will start with finite state and action space MDPs. An `empirical value learning’ (EVL) algorithm can be derived quite simply by replacing the expectation in the Bellman operator with an empirical estimate.  We note that the EVL algorithm has remarkably good numerical performance for practical purposes. We next extend this to continuous state spaces by considering randomized function approximation on a reproducible kernel Hilbert space (RKHS). This allows for arbitrarily good approximation with high probability for any problem due to its universal function approximation property. Last, I will introduce the RANDPOL (randomized function approximation for policy iteration) algorithm, an actor-critic algorithm that used randomized neural networks that can successfully solve a tough robotic problem. We also provide theoretical performance guarantees for the algorithm. I will also touch upon the probabilistic contraction analysis framework of iterative stochastic algorithms that underpins the theoretical analysis. This talk is based on work with a number of people that includes Dileep Kalathil (Texas A&M), Hiteshi Sharma (Microsoft),  Abhishek Gupta (Ohio State), William Haskell (Purdue), Vivek Borkar (IIT Bombay) and Peter Glynn (Stanford).


Access: Seminar will be delivered live via Zoom. At the date and time above, you can access the talk HERE (Zoom account required).

The Zoom conferencing system is accessible to UT faculty, staff, and students with support from ITS. Otherwise, you can sign up for a free account on the Zoom website.



Photo: Rahul Jain
Associate Professor
University of Southern California, Los Angeles

Rahul Jain is the K. C. Dahlberg Early Career Chair and Associate Professor of Electrical Engineering, Computer Science* and ISE* (*by courtesy) at the University of Southern California (USC). He received a B.Tech from the IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC, he was at the IBM T J Watson Research Center, Yorktown Heights, NY. He has received numerous awards including the NSF CAREER award, the ONR Young Investigator award, an IBM Faculty award, the James H. Zumberge Faculty Research and Innovation Award, and is a US Fulbright Scholar. His interests span reinforcement learning, stochastic control, statistical learning, stochastic networks, and game theory, and power systems and healthcare on the applications side.