Factored Bandits

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Factored Bandits. / Zimmert, Julian Ulf; Seldin, Yevgeny.

Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. NIPS Proceedings, 2018. (Advances in Neural Information Processing Systems, Vol. 31).

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Zimmert, JU & Seldin, Y 2018, Factored Bandits. in Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. NIPS Proceedings, Advances in Neural Information Processing Systems, vol. 31, 32nd Annual Conference on Neural Information Processing Systems, Montreal, Canada, 02/12/2018. <https://papers.nips.cc/paper/7548-factored-bandits>

APA

Zimmert, J. U., & Seldin, Y. (2018). Factored Bandits. In Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. NIPS Proceedings. Advances in Neural Information Processing Systems Vol. 31 https://papers.nips.cc/paper/7548-factored-bandits

Vancouver

Zimmert JU, Seldin Y. Factored Bandits. In Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. NIPS Proceedings. 2018. (Advances in Neural Information Processing Systems, Vol. 31).

Author

Zimmert, Julian Ulf ; Seldin, Yevgeny. / Factored Bandits. Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. NIPS Proceedings, 2018. (Advances in Neural Information Processing Systems, Vol. 31).

Bibtex

@inproceedings{593f14d7b2004a8983b94c83b5c50b7d,

title = "Factored Bandits",

abstract = "We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).",

author = "Zimmert, {Julian Ulf} and Yevgeny Seldin",

year = "2018",

language = "English",

series = "Advances in Neural Information Processing Systems",

publisher = "NIPS Proceedings",

booktitle = "Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr{\'e}al, Canada.",

note = "32nd Annual Conference on Neural Information Processing Systems, NeurIPS ; Conference date: 02-12-2018 Through 08-12-2018",

url = "https://nips.cc/Conferences/2018",

}

RIS

TY - GEN

T1 - Factored Bandits

AU - Zimmert, Julian Ulf

AU - Seldin, Yevgeny

N1 - Conference code: 32

PY - 2018

Y1 - 2018

N2 - We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

AB - We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

M3 - Article in proceedings

T3 - Advances in Neural Information Processing Systems

BT - Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.

PB - NIPS Proceedings

T2 - 32nd Annual Conference on Neural Information Processing Systems

Y2 - 2 December 2018 through 8 December 2018

ER -

ID: 225479776

Department of Computer Science