A Stochastic Approximation method with Max-Norm Projections and its Application to the Q-Learning Algorithm

By Sumit Kunnumkal, Huseyin Topaloglu
ACM Transactions on Computer Modeling and Simulation | September 2010

Citation

Kunnumkal, Sumit., Topaloglu, Huseyin. A Stochastic Approximation method with Max-Norm Projections and its Application to the Q-Learning Algorithm ACM Transactions on Computer Modeling and Simulation .

ACM Transactions on Computer Modeling and Simulation, 2010

Abstract

In this paper, we develop a stochastic approximation method to solve a monotone estimation problem and use this method to enhance the empirical performance of the Q-learning algorithm when applied to Markov decision problems with monotone value functions. We begin by considering a monotone
estimation problem where we want to estimate the expectation of a random vector n. We assume
that the components of E{n} are known to be in increasing order. The stochastic approximation
method that we propose is designed to exploit this information by projecting its iterates onto the set of vectors with increasing components. The novel aspect of the method is that it uses projections with respect to the max norm. We show the almost sure convergence of the stochastic approximation method. After this result, we consider the Q-learning algorithm when applied to Markov decision problems with monotone value functions. We study a variant of the Q-learning algorithm that uses projections to ensure that the value function approximation that is obtained at each iteration is also
monotone. Computational results indicate that the performance of the Q-learning algorithm can be improved significantly by exploiting the monotonicity property of the value functions.

Additional information

Contributions

Sumit Kunnumkal

Sumit Kunnumkal is a Professor and Area Leader of Operations Management at the Indian School of Business (ISB). He holds a PhD in Operations Research from Cornell University. He received his MS in Transportation from the Massachusetts Institute of Technology and a B.Tech in Civil Engineering from the Indian Institute of Technology, Madras.

Professor Kunnumkal has previously taught at the Smith School of Business, Queen’s University, and has held visiting positions at the Singapore University of Technology and Design and Universitat Pompeu Fabra. His research interests lie in the areas of pricing and revenue management, retail operations, assortment planning, and approximate dynamic programming.

At ISB, he has taught in the PGP programme, the Fellow programme, and various Advanced Management and Executive Education programmes.

More publications by author

Programmes

Post Graduate Programmes

Advanced Management Programmes

Fellow Programmes

Entrepreneurship Programmes

Online Programmes

Executive Education

Faculty and Research

World-Class Faculty

Research at ISB

Research Centres and Institutes

Research Conferences

Alumni

Alumni Benefits

ISB Women

Alumni Growth Stories

Reflections

Attend an Event

About

Our Story

Vision and Mission

ISB Advantage

Leadership

Partnerships

Life on Campus

Sustainability at ISB

More

Information for

News

Engage with us:

Others