Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor

EADS talk by Thomas Dueholm Hansen, Aarhus University

Ye (2011) showed recently that the simplex method with Dantzig's pivoting rule, as well as Howard's policy iteration algorithm, solve discounted Markov decision processes, with a constant discount factor, in strongly polynomial time. We improve Ye's analysis in two respects. First, we show a tighter bound for Howard's policy iteration algorithm. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games when the discount factor is fixed.

Joint work with Peter Bro Miltersen and Uri Zwick.

About the speaker:
Thomas Dueholm Hansen is currently an assistant professor at the department of computer science at Aarhus University. He received a Ph.D. from Aarhus University in 2012, and later worked as a postdoc at Tel Aviv University and Stanford University. His main research interests involve algorithms in optimization and game theory.