Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. / Cipollone, Roberto ; Jonsson, Anders ; Ronca, Alessandro ; Talebi, Mohammad Sadegh.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings, 2023. (Advances in Neural Information Processing Systems, Vol. 36).

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Cipollone, R, Jonsson, A, Ronca, A & Talebi, MS 2023, Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. in Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings, Advances in Neural Information Processing Systems, vol. 36, 37th Conference on Neural Information Processing Systems - NeurIPS 2023, New Orleans., United States, 10/12/2023. <https://proceedings.neurips.cc/paper_files/paper/2023/hash/7bf3e93543a612b75b6373178ba1faa4-Abstract-Conference.html>

APA

Cipollone, R., Jonsson, A., Ronca, A., & Talebi, M. S. (2023). Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023) NeurIPS Proceedings. Advances in Neural Information Processing Systems Vol. 36 https://proceedings.neurips.cc/paper_files/paper/2023/hash/7bf3e93543a612b75b6373178ba1faa4-Abstract-Conference.html

Vancouver

Cipollone R, Jonsson A, Ronca A, Talebi MS. Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings. 2023. (Advances in Neural Information Processing Systems, Vol. 36).

Author

Cipollone, Roberto ; Jonsson, Anders ; Ronca, Alessandro ; Talebi, Mohammad Sadegh. / Provably Efficient Offline Reinforcement Learning in Regular Decision Processes. Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings, 2023. (Advances in Neural Information Processing Systems, Vol. 36).

Bibtex

@inproceedings{430ca7828af74caba82b0ef9d070d4e9,

title = "Provably Efficient Offline Reinforcement Learning in Regular Decision Processes",

abstract = "This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.",

author = "Roberto Cipollone and Anders Jonsson and Alessandro Ronca and Talebi, {Mohammad Sadegh}",

year = "2023",

language = "English",

series = "Advances in Neural Information Processing Systems",

publisher = "NeurIPS Proceedings",

booktitle = "Advances in Neural Information Processing Systems 36 (NeurIPS 2023)",

note = "37th Conference on Neural Information Processing Systems - NeurIPS 2023 ; Conference date: 10-12-2023 Through 16-12-2023",

}

RIS

TY - GEN

T1 - Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

AU - Cipollone, Roberto

AU - Jonsson, Anders

AU - Ronca, Alessandro

AU - Talebi, Mohammad Sadegh

PY - 2023

Y1 - 2023

N2 - This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.

AB - This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.

M3 - Article in proceedings

T3 - Advances in Neural Information Processing Systems

BT - Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

PB - NeurIPS Proceedings

T2 - 37th Conference on Neural Information Processing Systems - NeurIPS 2023

Y2 - 10 December 2023 through 16 December 2023

ER -

ID: 386160160

Department of Computer Science