An Optimal Algorithm for Stochastic and Adversarial Bandits

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

An Optimal Algorithm for Stochastic and Adversarial Bandits. / Zimmert, Julian Ulf; Seldin, Yevgeny.

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS). ed. / Kamalika Chaudhuri; Masashi Sugiyama. PMLR, 2019. p. 467-475 (Proceedings of Machine Learning Research, Vol. 89).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Zimmert, JU & Seldin, Y 2019, An Optimal Algorithm for Stochastic and Adversarial Bandits. in K Chaudhuri & M Sugiyama (eds), Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, Proceedings of Machine Learning Research, vol. 89, pp. 467-475, 22nd International Conference on Artificial Intelligence and Statistics (AISTAT), Naha, Okinawa, Japan, 16/04/2019. <http://proceedings.mlr.press/v89/zimmert19a/zimmert19a-supp.pdf>

APA

Zimmert, J. U., & Seldin, Y. (2019). An Optimal Algorithm for Stochastic and Adversarial Bandits. In K. Chaudhuri, & M. Sugiyama (Eds.), Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) (pp. 467-475). PMLR. Proceedings of Machine Learning Research Vol. 89 http://proceedings.mlr.press/v89/zimmert19a/zimmert19a-supp.pdf

Vancouver

Zimmert JU, Seldin Y. An Optimal Algorithm for Stochastic and Adversarial Bandits. In Chaudhuri K, Sugiyama M, editors, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR. 2019. p. 467-475. (Proceedings of Machine Learning Research, Vol. 89).

Author

Zimmert, Julian Ulf ; Seldin, Yevgeny. / An Optimal Algorithm for Stochastic and Adversarial Bandits. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS). editor / Kamalika Chaudhuri ; Masashi Sugiyama. PMLR, 2019. pp. 467-475 (Proceedings of Machine Learning Research, Vol. 89).

Bibtex

@inproceedings{691f32db1f274f3b8dd7df37b3635cdd,
title = "An Optimal Algorithm for Stochastic and Adversarial Bandits",
abstract = "We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent with Tsallis entropy regularizer. We provide a complete characterization of such algorithms and show that Tsallis entropy with power α=1/2 achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). The algorithm also achieves adversarial and stochastic optimality in the utility-based dueling bandit setting. We provide empirical evaluation of the algorithm demonstrating that it outperforms UCB1 and EXP3 in stochastic environments. In certain adversarial regimes the algorithm significantly outperforms UCB1 and Thompson Sampling, which exhibit almost linear regret.",
author = "Zimmert, {Julian Ulf} and Yevgeny Seldin",
year = "2019",
language = "English",
series = "Proceedings of Machine Learning Research",
pages = "467--475",
editor = "Chaudhuri, {Kamalika } and Sugiyama, {Masashi }",
booktitle = "Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)",
publisher = "PMLR",
note = "22nd International Conference on Artificial Intelligence and Statistics (AISTAT) ; Conference date: 16-04-2019 Through 18-04-2019",

}

RIS

TY - GEN

T1 - An Optimal Algorithm for Stochastic and Adversarial Bandits

AU - Zimmert, Julian Ulf

AU - Seldin, Yevgeny

PY - 2019

Y1 - 2019

N2 - We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent with Tsallis entropy regularizer. We provide a complete characterization of such algorithms and show that Tsallis entropy with power α=1/2 achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). The algorithm also achieves adversarial and stochastic optimality in the utility-based dueling bandit setting. We provide empirical evaluation of the algorithm demonstrating that it outperforms UCB1 and EXP3 in stochastic environments. In certain adversarial regimes the algorithm significantly outperforms UCB1 and Thompson Sampling, which exhibit almost linear regret.

AB - We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent with Tsallis entropy regularizer. We provide a complete characterization of such algorithms and show that Tsallis entropy with power α=1/2 achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). The algorithm also achieves adversarial and stochastic optimality in the utility-based dueling bandit setting. We provide empirical evaluation of the algorithm demonstrating that it outperforms UCB1 and EXP3 in stochastic environments. In certain adversarial regimes the algorithm significantly outperforms UCB1 and Thompson Sampling, which exhibit almost linear regret.

M3 - Article in proceedings

T3 - Proceedings of Machine Learning Research

SP - 467

EP - 475

BT - Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)

A2 - Chaudhuri, Kamalika

A2 - Sugiyama, Masashi

PB - PMLR

T2 - 22nd International Conference on Artificial Intelligence and Statistics (AISTAT)

Y2 - 16 April 2019 through 18 April 2019

ER -

ID: 225479605