Virtual Seminar: Reinforcement Learning using Generative Models for Continuous State and Action Space Systems
Meeting Time: 11:00 AM – 12:00 PM Central (CDT; UTC -5)
Reinforcement Learning (RL) problems for continuous state and action space systems are among the most challenging in RL. Recently, deep reinforcement learning methods have been shown to be quite effective for certain RL problems in settings of very large/continuous state and action spaces. But such methods require extensive hyper-parameter tuning, huge amount of data, and come with no performance guarantees. We note that such methods are mostly trained `offline’ on experience replay buffers. In this talk, I will describe a series of simple reinforcement learning schemes for various settings. Our premise is that we have access to a generative model that can give us simulated samples of the next state. We will start with finite state and action space MDPs. An `empirical value learning’ (EVL) algorithm can be derived quite simply by replacing the expectation in the Bellman operator with an empirical estimate. We note that the EVL algorithm has remarkably good numerical performance for practical purposes. We next extend this to continuous state spaces by considering randomized function approximation on a reproducible kernel Hilbert space (RKHS). This allows for arbitrarily good approximation with high probability for any problem due to its universal function approximation property. Last, I will introduce the RANDPOL (randomized function approximation for policy iteration) algorithm, an actor-critic algorithm that used randomized neural networks that can successfully solve a tough robotic problem. We also provide theoretical performance guarantees for the algorithm. I will also touch upon the probabilistic contraction analysis framework of iterative stochastic algorithms that underpins the theoretical analysis. This talk is based on work with a number of people that includes Dileep Kalathil (Texas A&M), Hiteshi Sharma (Microsoft), Abhishek Gupta (Ohio State), William Haskell (Purdue), Vivek Borkar (IIT Bombay) and Peter Glynn (Stanford).
Access: Seminar will be delivered live via Zoom. At the date and time above, you can access the talk HERE (Zoom account required).