Scaling Up Q-Learning via Exploiting State–Action Equivalence

Research output: Contribution to journalJournal articleResearchpeer-review

Documents

  • Fulltext

    Final published version, 874 KB, PDF document

Recent success stories in reinforcement learning have demonstrated that leveraging structural properties of the underlying environment is key in devising viable methods capable of solving complex tasks. We study off-policy learning in discounted reinforcement learning, where some equivalence relation in the environment exists. We introduce a new model-free algorithm, called QL-ES (Q-learning with equivalence structure), which is a variant of (asynchronous) Q-learning tailored to exploit the equivalence structure in the MDP. We report a non-asymptotic PAC-type sample complexity bound for QL-ES, thereby establishing its sample efficiency. This bound also allows us to quantify the superiority of QL-ES over Q-learning analytically, which shows that the theoretical gain in some domains can be massive. We report extensive numerical experiments demonstrating that QL-ES converges significantly faster than (structure-oblivious) Q-learning empirically. They imply that the empirical performance gain obtained by exploiting the equivalence structure could be massive, even in simple domains. To the best of our knowledge, QL-ES is the first provably efficient model-free algorithm to exploit the equivalence structure in finite MDPs.

Original languageEnglish
Article number584
JournalEntropy
Volume25
Issue number4
ISSN1099-4300
DOIs
Publication statusPublished - 2023

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

    Research areas

  • equivalence structure, Markov decision process, Q-learning, reinforcement learning

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 347308519