Logarithmic regret in communicating MDPs: Leveraging known dynamics with bandits
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Dokumenter
- Fulltext
Forlagets udgivne version, 883 KB, PDF-dokument
We study regret minimization in an average-reward and communicating Markov Decision Process (MDP) with known dynamics, but unknown reward function. Although learning in such MDPs is a priori easier than in fully unknown ones, they are still largely challenging as they include as special cases large classes of problems such as combinatorial semi-bandits. Leveraging the knowledge on transition function in regret minimization, in a statistically efficient way, appears largely unexplored. As it is conjectured that achieving exact optimality in generic MDPs is NP-hard, even with known transitions, we focus on a computationally efficient relaxation, at the cost of achieving order-optimal logarithmic regret instead of exact optimality. We contribute to filling this gap by introducing a novel algorithm based on the popular Indexed Minimum Empirical Divergence strategy for bandits. A key component of the proposed algorithm is a carefully designed stopping criterion leveraging the recurrent classes induced by stationary policies. We derive a non-asymptotic, problem-dependent, and logarithmic regret bound for this algorithm, which relies on a novel regret decomposition leveraging the structure. We further provide an efficient implementation and experiments illustrating its promising empirical performance.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 15th Asian Conference on Machine Learning |
Forlag | PMLR |
Publikationsdato | 2024 |
Sider | 1167-1182 |
Status | Udgivet - 2024 |
Begivenhed | 15th Asian Conference on Machine Learning - İstanbul, Tyrkiet Varighed: 11 nov. 2023 → 14 nov. 2023 |
Konference
Konference | 15th Asian Conference on Machine Learning |
---|---|
Land | Tyrkiet |
By | İstanbul |
Periode | 11/11/2023 → 14/11/2023 |
Navn | Proceedings of Machine Learning Research |
---|---|
Vol/bind | 222 |
ISSN | 2640-3498 |
Links
- https://proceedings.mlr.press/v222/saber24a.html
Forlagets udgivne version
ID: 386154817