An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
Original language | English |
---|---|
Title of host publication | Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands |
Editors | Satyen Kale, Ohad Shamir |
Publisher | Proceedings of Machine Learning Research |
Publication date | 2017 |
Pages | 1743-1759 |
Publication status | Published - 2017 |
Event | The 30th Annual Conference on Learning Theory (COLT) - Amsterdam, Netherlands Duration: 7 Jul 2017 → 10 Jul 2017 Conference number: 30 http://www.learningtheory.org/colt2017/ |
Conference
Conference | The 30th Annual Conference on Learning Theory (COLT) |
---|---|
Nummer | 30 |
Land | Netherlands |
By | Amsterdam |
Periode | 07/07/2017 → 10/07/2017 |
Internetadresse |
Series | Proceedings of Machine Learning Research |
---|---|
Volume | 65 |
ISSN | 1938-7228 |
Links
- http://proceedings.mlr.press/v65/seldin17a/seldin17a.pdf
Final published version
ID: 197766233