An algorithm for stochastic and adversarial bandits with switching costs

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

An algorithm for stochastic and adversarial bandits with switching costs. / Rouyer, Chloé; Seldin, Yevgeny; Cesa-Bianchi, Nicolò.

Proceedings of the 38th International Conference on Machine Learning (ICML). PMLR, 2021. p. 9127-9135 (Proceedings of Machine Learning Research, Vol. 139).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Rouyer, C, Seldin, Y & Cesa-Bianchi, N 2021, An algorithm for stochastic and adversarial bandits with switching costs. in Proceedings of the 38th International Conference on Machine Learning (ICML). PMLR, Proceedings of Machine Learning Research, vol. 139, pp. 9127-9135, 38th International Conference on Machine Learning (ICML), Virtual, 18/07/2021. <https://proceedings.mlr.press/v139/>

APA

Rouyer, C., Seldin, Y., & Cesa-Bianchi, N. (2021). An algorithm for stochastic and adversarial bandits with switching costs. In Proceedings of the 38th International Conference on Machine Learning (ICML) (pp. 9127-9135). PMLR. Proceedings of Machine Learning Research Vol. 139 https://proceedings.mlr.press/v139/

Vancouver

Rouyer C, Seldin Y, Cesa-Bianchi N. An algorithm for stochastic and adversarial bandits with switching costs. In Proceedings of the 38th International Conference on Machine Learning (ICML). PMLR. 2021. p. 9127-9135. (Proceedings of Machine Learning Research, Vol. 139).

Author

Rouyer, Chloé ; Seldin, Yevgeny ; Cesa-Bianchi, Nicolò. / An algorithm for stochastic and adversarial bandits with switching costs. Proceedings of the 38th International Conference on Machine Learning (ICML). PMLR, 2021. pp. 9127-9135 (Proceedings of Machine Learning Research, Vol. 139).

Bibtex

@inproceedings{b0f5af2522d04170bce18fc9c6880ae4,
title = "An algorithm for stochastic and adversarial bandits with switching costs",
abstract = "We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λK)1/3T2/3+KT−−−√), where T is the time horizon and K is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O((λK)2/3T1/3+lnT)∑i≠i∗Δ−1i), where Δi are suboptimality gaps and i∗ is the unique optimal arm. In the special case of λ=0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.",
author = "Chlo{\'e} Rouyer and Yevgeny Seldin and Nicol{\`o} Cesa-Bianchi",
year = "2021",
language = "English",
series = "Proceedings of Machine Learning Research",
pages = "9127--9135",
booktitle = "Proceedings of the 38th International Conference on Machine Learning (ICML)",
publisher = "PMLR",
note = "38th International Conference on Machine Learning (ICML) ; Conference date: 18-07-2021 Through 24-07-2021",

}

RIS

TY - GEN

T1 - An algorithm for stochastic and adversarial bandits with switching costs

AU - Rouyer, Chloé

AU - Seldin, Yevgeny

AU - Cesa-Bianchi, Nicolò

PY - 2021

Y1 - 2021

N2 - We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λK)1/3T2/3+KT−−−√), where T is the time horizon and K is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O((λK)2/3T1/3+lnT)∑i≠i∗Δ−1i), where Δi are suboptimality gaps and i∗ is the unique optimal arm. In the special case of λ=0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.

AB - We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λK)1/3T2/3+KT−−−√), where T is the time horizon and K is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O((λK)2/3T1/3+lnT)∑i≠i∗Δ−1i), where Δi are suboptimality gaps and i∗ is the unique optimal arm. In the special case of λ=0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.

M3 - Article in proceedings

T3 - Proceedings of Machine Learning Research

SP - 9127

EP - 9135

BT - Proceedings of the 38th International Conference on Machine Learning (ICML)

PB - PMLR

T2 - 38th International Conference on Machine Learning (ICML)

Y2 - 18 July 2021 through 24 July 2021

ER -

ID: 298390201