DeLTA Seminar by Xiaoyan Hu: (More) Sample-Efficient Offline RL with Options
Speaker:
Xiaoyan Hu from the Chinese University of Hong Kong
Title:
(More) Sample-Efficient Offline RL with Options
Abstract:
The options framework yields empirical success in long-horizon planning problems of reinforcement learning (RL). Recent works show that options help improve the sample efficiency in \emph{online} RL. However, these results are no longer applicable to scenarios where exploring the environment online is risky, e.g., automated driving and healthcare. In this paper, we provide the first analysis of the sample complexity for offline RL with options, where the agent learns from a dataset without further interaction with the environment. We derive a novel information-theoretic lower bound, which generalizes the one for offline learning with actions. We propose the \textbf{PE}ssimistic \textbf{V}alue \textbf{I}teration for Learning with \textbf{O}ptions (PEVIO) algorithm and establish near-optimal suboptimality bounds for two popular data-collection procedures, where the first one collects state-option transitions and the second one collects state-action transitions. We show that compared to offline RL with actions, using options not only enjoys a faster finite-time convergence rate (to the optimal value) but also attains a better performance when either the options are carefully designed or the offline data is limited. Based on these results, we analyze the pros and cons of the data-collection procedures.