Exploiting the Structural Properties of the Underlying Markov Decision Problem in Q-Learning Algorithm
By Sumit Kunnumkal, Huseyin Topaloglu
INFORMS Journal on Computing | 2008
DOI
pubsonline.informs.org/doi/epdf/10.1287/ijoc.1070.0240
Citation
Kunnumkal, Sumit., Huseyin Topaloglu. Exploiting the Structural Properties of the Underlying Markov Decision Problem in Q-Learning Algorithm INFORMS Journal on Computing pubsonline.informs.org/doi/epdf/10.1287/ijoc.1070.0240.
Copyright
INFORMS Journal on Computing, 2008
Share:
Abstract
This paper shows how to exploit the structural properties of the underlying Markov decision problem to improve the convergence behavior of the Q-learning algorithm. In particular, we consider infinite-horizon discounted-cost Markov decision problems where there is a natural ordering between the states of the system and the value function is known to be monotone in the state. We propose a new variant of the Q-learning algorithm that ensures that the value function approxi-mations obtained during the intermediate iterations are also monotone in the state. We establish the convergence of the proposed algorithm and experimentally show that it significantly improves the convergence behavior of the standard version of the Q-learning algorithm.

Sumit Kunnumkal is a Professor and Area Leader of Operations Management at the Indian School of Business (ISB). He holds a PhD in Operations Research from Cornell University. He received his MS in Transportation from the Massachusetts Institute of Technology and a B.Tech in Civil Engineering from the Indian Institute of Technology, Madras.

Professor Kunnumkal has previously taught at the Smith School of Business, Queen’s University, and has held visiting positions at the Singapore University of Technology and Design and Universitat Pompeu Fabra. His research interests lie in the areas of pricing and revenue management, retail operations, assortment planning, and approximate dynamic programming.

At ISB, he has taught in the PGP programme, the Fellow programme, and various Advanced Management and Executive Education programmes.

Sumit Kunnumkal
Sumit Kunnumkal