DeLTA seminar by Liam Schramm: Exponentially faster exploration in MCTS through state occupancy measure regularization

Speaker
Liam Schramm, Rutgers, The State University of New Jersey.
Title
Exponentially faster exploration in MCTS through state occupancy measure regularization
Abstract
Long-horizon exploration remains a fundamental challenge in reinforcement learning, with traditional methods like policy gradients often requiring an exponential number of episodes to discover nonzero rewards. Recent work demonstrates that convex optimization of the state occupancy measure can yield provable regret bounds in discrete environments. Building on this insight, we present a novel MCTS algorithm that accelerates exploration exponentially compared to standard MCTS, while also generalizing to continuous state and action spaces. With minor approximations, this approach becomes a practical solution for high-dimensional tasks characterized by sparse rewards and long time horizons.
_____________________________
You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar
DeLTA Lab page

DeLTA is a research group affiliated with the Department of Computer Science at the University of Copenhagen studying diverse aspects of Machine Learning Theory and its applications, including, but not limited to Reinforcement Learning, Online Learning and Bandits, PAC-Bayesian analysis