Factored Bandits
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Factored Bandits. / Zimmert, Julian Ulf; Seldin, Yevgeny.
Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. NIPS Proceedings, 2018. (Advances in Neural Information Processing Systems, Vol. 31).Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Factored Bandits
AU - Zimmert, Julian Ulf
AU - Seldin, Yevgeny
N1 - Conference code: 32
PY - 2018
Y1 - 2018
N2 - We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).
AB - We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).
M3 - Article in proceedings
T3 - Advances in Neural Information Processing Systems
BT - Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
PB - NIPS Proceedings
T2 - 32nd Annual Conference on Neural Information Processing Systems
Y2 - 2 December 2018 through 8 December 2018
ER -
ID: 225479776