Efficient Computation of Adaptive Routing Decisions Under Uncertainty

08 Apr 2015

A central question in decision-making under uncertainty is how to mitigate risk. Under the expected utility framework this means to find the decision (or a sequence of decisions) that maximizes the expected utility of the decision, for some nonlinear utility function that represents the agent’s risk-averse preferences.  In sequential decision-making, due to the large space of possible decisions, it is especially challenging to compute an optimal policy—that is, an optimal sequence of actions, efficiently.  In the prevalent framework of Markov Decision Processes (MDPs), efficient algorithms are known for the special cases of linear utility (that corresponds to a risk-neutral agent) and exponential utility.  Little is known about efficient computation of the optimal policy for utility functions beyond linear and exponential, and PhD student Darrell Hoy and WNCG faculty Evdokia Nikolova provide examples suggesting that computing the optimal policy in such situations is likely intractable. With the goal of retaining efficient computation, they focus on finding an approximately optimal policy and, in particular, on the question how well can we approximate routing decisions in polynomial time?

Hoy and Nikolova establish that the optimal routing policy can be approximated arbitrarily well, in polynomial time, under general monotone utility functions that can capture a wide range of risk-averse attitudes of the decision maker. 

At the core of their algorithm they present a general-purpose technique based on adaptive discretization that works for any monotone utility function.  The main insight is to perform discretization at the utility level space, which results in a nonuniform discretization of the domain (cost or travel time).  In contrast to much existing research, this approximation is to the optimal continuous-time policy, not the optimal policy to the time-discretized problem. This is essential in nonlinear utility models since standard constant-step discretization of time incurs approximation errors that may become significant when the utility function is a step function (e.g., the probability of arriving on time) or a steep function with a large gradient.  Hoy and Nikolova's result represents agents who have non-linear utilities over arrival times, and their model applies equally well when agents have a non-linear utility over the expenditure of a different resource, for instance money or fuel.

Approximately Optimal Risk-averse Routing Policies via Adaptive Discretization.

Darrell Hoy and Evdokia Nikolova.

In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15). Austin, TX, January 25-30, 2015.