DeLTA seminar by Julian Zimmert: PRODuctive bandits: Importance Weighting No More

Delta Logo

Follow this link to participate on Zoom

Speaker

Julian Zimmert, Google Research

Title

PRODuctive bandits: Importance Weighting No More

Abstract

Prod is a seminal algorithm in full-information online learning, which has been conjectured to be fundamentally sub-optimal for multi-armed bandits.
By leveraging the interpretation of Prod as a first-order OMD approximation, we present the following surprising results:
1. Variants of Prod can obtain optimal regret for adversarial multi-armed bandits.

  1. There exists a simple and (arguably) importance-weighting free variant with optimal rate.
    3. One can even achieve best-both-worlds guarantees with logarithmic regret in the stochastic regime.

    The bandit algorithms in this work use simple arithmetic update rules without the need of solving optimization problems typical in prior work. Finally, the results directly improve the state of the art of incentive-compatible bandits.

 

Joint work with Teodor V. Marinov.

_____________________________

You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar
DeLTA Lab page