DeLTA seminar by Andrea Tirinzoni: On Instance-optimal PAC Reinforcement Learning

Delta logo

Speaker

Andrea Tirinzoni, postdoctoral researcher in the FAIR team at Meta (Paris).

Title

On Instance-optimal PAC Reinforcement Learning

Abstract

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify a near-optimal policy with high probability. While worst-case optimality has been attained for this problem, it is still unclear how an algorithm can optimally adapt to the complexity of the specific problem instance it faces. Toward addressing this limitation, in this talk, I will review some of our recent works on instance optimality in tabular episodic Markov decision processes (MDPs). First, for MDPs with deterministic transition dynamics, I will present an instance-dependent lower bound on the sample complexity of PAC RL and a computationally-efficient algorithm nearly matching it. While our lower bound is written as a linear program related to graph-theoretical concepts (minimum flows), our algorithm is very simple and does not require solving such an optimization problem explicitly. Second, for general MDPs with stochastic dynamics, I will show, through another instance-dependent lower bound, that the PEDEL algorithm by Wagenmaker and Jamieson (2022) is near instance-optimal. Considering the intractability of PEDEL, I will formulate an open question regarding the possibility of achieving our lower bound using a computationally-efficient algorithm.

_____________________________

You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar
DeLTA Lab page