A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 528 KB, PDF-dokument

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is O(√TK + √dT log K), where T is the time horizon, K is the number of arms, and d is the fixed delay, whereas the stochastic regret guarantee is O (equation presented), where Δ_i are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay d_max and achieves O(√TK + √Dlog K + d_maxK^1/3 log K) regret in the adversarial regime, where D is the total delay, and O (equation presented) regret in the stochastic regime, where σ_max is the maximal number of outstanding observations. Finally, we present a lower bound that matches the refined adversarial regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.

Originalsprog	Engelsk
Titel	Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
Redaktører	S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
Antal sider	26
Forlag	NeurIPS Proceedings
Publikationsdato	2022
ISBN (Elektronisk)	9781713871088
Status	Udgivet - 2022
Begivenhed	36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, USA Varighed: 28 nov. 2022 → 9 dec. 2022

Konference

Konference	36th Conference on Neural Information Processing Systems, NeurIPS 2022
Land	USA
By	New Orleans
Periode	28/11/2022 → 09/12/2022

Navn	Advances in Neural Information Processing Systems
Vol/bind	35
ISSN	1049-5258

Bibliografisk note

Funding Information:
This project has received funding from European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 801199. YS acknowledges partial support by the Independent Research Fund Denmark, grant number 9040-00361B.

Funding Information:
This project has received funding from European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 801199. YS acknowledges partial support by the Independent Research Fund Denmark, grant number 9040-00361B.

Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.

ID: 383431352

Datalogisk Institut