DeLTA seminar by Liam Schramm: Exponentially faster exploration in MCTS through state occupancy measure regularization

Delta Logo

Join via Zoom

Liam Schramm

Speaker

Liam Schramm, Rutgers, The State University of New Jersey.

Title

Exponentially faster exploration in MCTS through state occupancy measure regularization

Abstract

Long-horizon exploration remains a fundamental challenge in reinforcement learning, with traditional methods like policy gradients often requiring an exponential number of episodes to discover nonzero rewards. Recent work demonstrates that convex optimization of the state occupancy measure can yield provable regret bounds in discrete environments. Building on this insight, we present a novel MCTS algorithm that accelerates exploration exponentially compared to standard MCTS, while also generalizing to continuous state and action spaces. With minor approximations, this approach becomes a practical solution for high-dimensional tasks characterized by sparse rewards and long time horizons.

_____________________________

You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar
DeLTA Lab page