DeLTA seminar by Julian Zimmert: PRODuctive bandits: Importance Weighting No More
Follow this link to participate on Zoom
Speaker
Julian Zimmert, Google Research
Title
PRODuctive bandits: Importance Weighting No More
Abstract
Prod is a seminal algorithm in full-information online learning, which has been conjectured to be fundamentally sub-optimal for multi-armed bandits.
By leveraging the interpretation of Prod as a first-order OMD approximation, we present the following surprising results:
1. Variants of Prod can obtain optimal regret for adversarial multi-armed bandits.
- There exists a simple and (arguably) importance-weighting free variant with optimal rate.
3. One can even achieve best-both-worlds guarantees with logarithmic regret in the stochastic regime.
The bandit algorithms in this work use simple arithmetic update rules without the need of solving optimization problems typical in prior work. Finally, the results directly improve the state of the art of incentive-compatible bandits.
Joint work with Teodor V. Marinov.
_____________________________
You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar
DeLTA Lab page