Improved Exploration in Factored Average-Reward MDPs

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 611 KB, PDF-dokument

Talebi, Sadegh
Anders Jonsson
Odalric-Ambrym Maillard

We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the state-action space XX and the state-space SS admit the respective factored forms of X=⊗ni=1XiX=⊗i=1nXi and S=⊗mi=1SiS=⊗i=1mSi, and the transition and reward functions are factored over XX and SS. Assuming a known a factorization structure, we introduce a novel regret minimization strategy inspired by the popular UCRL strategy, called DBN-UCRL, which relies on Bernstein-type confidence sets defined for individual elements of the transition function. We show that for a generic factorization structure, DBN-UCRL achieves a regret bound, whose leading term strictly improves over existing regret bounds in terms of the dependencies on the size of \cSi\cSi’s and the diameter. We further show that when the factorization structure corresponds to the Cartesian product of some base MDPs, the regret of DBN-UCRL is upper bounded by the sum of regret of the base MDPs. We demonstrate, through numerical experiments on standard environments, that DBN-UCRL enjoys a substantially improved regret empirically over existing algorithms that have frequentist regret guarantees.

Originalsprog	Engelsk
Titel	Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS)
Forlag	PMLR
Publikationsdato	2021
Sider	3988-3996
Status	Udgivet - 2021
Begivenhed	24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021) - San Diego, USA Varighed: 13 apr. 2021 → 15 apr. 2021

Konference

Konference	24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)
Land	USA
By	San Diego
Periode	13/04/2021 → 15/04/2021

Navn	Proceedings of Machine Learning Research
Vol/bind	130
ISSN	2640-3498

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk

Ingen data tilgængelig

ID: 301365745

Datalogisk Institut