A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback. / Masoudian, Saeed; Zimmert, Julian; Seldin, Yevgeny.

Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. red. / S. Koyejo; S. Mohamed; A. Agarwal; D. Belgrave; K. Cho; A. Oh. NeurIPS Proceedings, 2022. (Advances in Neural Information Processing Systems, Bind 35).

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Masoudian, S, Zimmert, J & Seldin, Y 2022, A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback. i S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho & A Oh (red), Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. NeurIPS Proceedings, Advances in Neural Information Processing Systems, bind 35, 36th Conference on Neural Information Processing Systems, NeurIPS 2022, New Orleans, USA, 28/11/2022.

APA

Masoudian, S., Zimmert, J., & Seldin, Y. (2022). A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback. I S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (red.), Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022 NeurIPS Proceedings. Advances in Neural Information Processing Systems Bind 35

Vancouver

Masoudian S, Zimmert J, Seldin Y. A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback. I Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, red., Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. NeurIPS Proceedings. 2022. (Advances in Neural Information Processing Systems, Bind 35).

Author

Masoudian, Saeed ; Zimmert, Julian ; Seldin, Yevgeny. / A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback. Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. red. / S. Koyejo ; S. Mohamed ; A. Agarwal ; D. Belgrave ; K. Cho ; A. Oh. NeurIPS Proceedings, 2022. (Advances in Neural Information Processing Systems, Bind 35).

Bibtex

@inproceedings{78817275adbf4db8add1e629601cf074,

title = "A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback",

abstract = "We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is O(√TK + √dT log K), where T is the time horizon, K is the number of arms, and d is the fixed delay, whereas the stochastic regret guarantee is O (equation presented), where Δi are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay dmax and achieves O(√TK + √Dlog K + dmaxK1/3 log K) regret in the adversarial regime, where D is the total delay, and O (equation presented) regret in the stochastic regime, where σmax is the maximal number of outstanding observations. Finally, we present a lower bound that matches the refined adversarial regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.",

author = "Saeed Masoudian and Julian Zimmert and Yevgeny Seldin",

note = "Publisher Copyright: {\textcopyright} 2022 Neural information processing systems foundation. All rights reserved.; 36th Conference on Neural Information Processing Systems, NeurIPS 2022 ; Conference date: 28-11-2022 Through 09-12-2022",

year = "2022",

language = "English",

series = "Advances in Neural Information Processing Systems",

publisher = "NeurIPS Proceedings",

editor = "S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh",

booktitle = "Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022",

}

RIS

TY - GEN

T1 - A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

AU - Masoudian, Saeed

AU - Zimmert, Julian

AU - Seldin, Yevgeny

PY - 2022

Y1 - 2022

N2 - We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is O(√TK + √dT log K), where T is the time horizon, K is the number of arms, and d is the fixed delay, whereas the stochastic regret guarantee is O (equation presented), where Δi are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay dmax and achieves O(√TK + √Dlog K + dmaxK1/3 log K) regret in the adversarial regime, where D is the total delay, and O (equation presented) regret in the stochastic regime, where σmax is the maximal number of outstanding observations. Finally, we present a lower bound that matches the refined adversarial regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.

AB - We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is O(√TK + √dT log K), where T is the time horizon, K is the number of arms, and d is the fixed delay, whereas the stochastic regret guarantee is O (equation presented), where Δi are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay dmax and achieves O(√TK + √Dlog K + dmaxK1/3 log K) regret in the adversarial regime, where D is the total delay, and O (equation presented) regret in the stochastic regime, where σmax is the maximal number of outstanding observations. Finally, we present a lower bound that matches the refined adversarial regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.

M3 - Article in proceedings

AN - SCOPUS:85146210868

T3 - Advances in Neural Information Processing Systems

BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022

A2 - Koyejo, S.

A2 - Mohamed, S.

A2 - Agarwal, A.

A2 - Belgrave, D.

A2 - Cho, K.

A2 - Oh, A.

PB - NeurIPS Proceedings

T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022

Y2 - 28 November 2022 through 9 December 2022

ER -

ID: 383431352

Datalogisk Institut