A Stochastic Approximation method with Max-Norm Projections and its Application to the Q-Learning Algorithm
By Sumit Kunnumkal, Huseyin Topaloglu
ACM Transactions on Computer Modeling and Simulation | September 2010
Citation
Kunnumkal, Sumit., Topaloglu, Huseyin. A Stochastic Approximation method with Max-Norm Projections and its Application to the Q-Learning Algorithm ACM Transactions on Computer Modeling and Simulation .
Copyright
ACM Transactions on Computer Modeling and Simulation, 2010
Share:
Abstract
In this paper, we develop a stochastic approximation method to solve a monotone estimation problem and use this method to enhance the empirical performance of the Q-learning algorithm when applied to Markov decision problems with monotone value functions. We begin by considering a monotone
estimation problem where we want to estimate the expectation of a random vector n. We assume
that the components of E{n} are known to be in increasing order. The stochastic approximation
method that we propose is designed to exploit this information by projecting its iterates onto the set of vectors with increasing components. The novel aspect of the method is that it uses projections with respect to the max norm. We show the almost sure convergence of the stochastic approximation method. After this result, we consider the Q-learning algorithm when applied to Markov decision problems with monotone value functions. We study a variant of the Q-learning algorithm that uses projections to ensure that the value function approximation that is obtained at each iteration is also
monotone. Computational results indicate that the performance of the Q-learning algorithm can be improved significantly by exploiting the monotonicity property of the value functions.

Sumit Kunnumkal is a Professor and Area Leader of Operations Management at the Indian School of Business (ISB). He holds a PhD in Operations Research from Cornell University. He received his MS in Transportation from the Massachusetts Institute of Technology and a B.Tech in Civil Engineering from the Indian Institute of Technology, Madras.

Professor Kunnumkal has previously taught at the Smith School of Business, Queen’s University, and has held visiting positions at the Singapore University of Technology and Design and Universitat Pompeu Fabra. His research interests lie in the areas of pricing and revenue management, retail operations, assortment planning, and approximate dynamic programming.

At ISB, he has taught in the PGP programme, the Fellow programme, and various Advanced Management and Executive Education programmes.

Sumit Kunnumkal
Sumit Kunnumkal