Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. / Cipollone, Roberto ; Jonsson, Anders ; Ronca, Alessandro ; Talebi, Mohammad Sadegh.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings, 2023. (Advances in Neural Information Processing Systems, Bind 36).

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Cipollone, R, Jonsson, A, Ronca, A & Talebi, MS 2023, Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. i Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings, Advances in Neural Information Processing Systems, bind 36, 37th Conference on Neural Information Processing Systems - NeurIPS 2023, New Orleans., USA, 10/12/2023. <https://proceedings.neurips.cc/paper_files/paper/2023/hash/7bf3e93543a612b75b6373178ba1faa4-Abstract-Conference.html>

APA

Cipollone, R., Jonsson, A., Ronca, A., & Talebi, M. S. (2023). Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. I Advances in Neural Information Processing Systems 36 (NeurIPS 2023) NeurIPS Proceedings. Advances in Neural Information Processing Systems Bind 36 https://proceedings.neurips.cc/paper_files/paper/2023/hash/7bf3e93543a612b75b6373178ba1faa4-Abstract-Conference.html

Vancouver

Cipollone R, Jonsson A, Ronca A, Talebi MS. Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. I Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings. 2023. (Advances in Neural Information Processing Systems, Bind 36).

Author

Cipollone, Roberto ; Jonsson, Anders ; Ronca, Alessandro ; Talebi, Mohammad Sadegh. / Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings, 2023. (Advances in Neural Information Processing Systems, Bind 36).

Bibtex

@inproceedings{430ca7828af74caba82b0ef9d070d4e9,
title = "Provably Efficient Offline Reinforcement Learning in Regular Decision Processes",
abstract = "This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.",
author = "Roberto Cipollone and Anders Jonsson and Alessandro Ronca and Talebi, {Mohammad Sadegh}",
year = "2023",
language = "English",
series = "Advances in Neural Information Processing Systems",
publisher = "NeurIPS Proceedings",
booktitle = "Advances in Neural Information Processing Systems 36 (NeurIPS 2023)",
note = "37th Conference on Neural Information Processing Systems - NeurIPS 2023 ; Conference date: 10-12-2023 Through 16-12-2023",

}

RIS

TY - GEN

T1 - Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

AU - Cipollone, Roberto

AU - Jonsson, Anders

AU - Ronca, Alessandro

AU - Talebi, Mohammad Sadegh

PY - 2023

Y1 - 2023

N2 - This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.

AB - This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.

M3 - Article in proceedings

T3 - Advances in Neural Information Processing Systems

BT - Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

PB - NeurIPS Proceedings

T2 - 37th Conference on Neural Information Processing Systems - NeurIPS 2023

Y2 - 10 December 2023 through 16 December 2023

ER -

ID: 386160160