An Optimal Algorithm for Stochastic and Adversarial Bandits

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Julian Ulf Zimmert
Seldin, Yevgeny

We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent with Tsallis entropy regularizer. We provide a complete characterization of such algorithms and show that Tsallis entropy with power α=1/2 achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). The algorithm also achieves adversarial and stochastic optimality in the utility-based dueling bandit setting. We provide empirical evaluation of the algorithm demonstrating that it outperforms UCB1 and EXP3 in stochastic environments. In certain adversarial regimes the algorithm significantly outperforms UCB1 and Thompson Sampling, which exhibit almost linear regret.

Originalsprog	Engelsk
Titel	Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)
Redaktører	Kamalika Chaudhuri, Masashi Sugiyama
Forlag	PMLR
Publikationsdato	2019
Sider	467-475
Status	Udgivet - 2019
Begivenhed	22nd International Conference on Artificial Intelligence and Statistics (AISTAT) - Naha, Okinawa, Japan Varighed: 16 apr. 2019 → 18 apr. 2019

Konference

Konference	22nd International Conference on Artificial Intelligence and Statistics (AISTAT)
Land	Japan
By	Naha, Okinawa
Periode	16/04/2019 → 18/04/2019

Navn	Proceedings of Machine Learning Research
Vol/bind	89
ISSN	1938-7228

Datalogisk Institut

An Optimal Algorithm for Stochastic and Adversarial Bandits

Konference

Links