DeLTA seminar by Emilie Kaufmann

Delta logo

Speaker

Emilie Kaufmann, INRIA Lille

Portrait of Emilie Kaufmann

Title

Non-parametric exploration in multi-armed bandits

Abstract

The multi-armed bandit model is useful for sequential resource allocation tasks in a stochastic environment, e.g., the design of a recommendation algorithm or an adaptive clinical trial. This simple model also captures the exploration/exploitation dilemma that is central in more structured reinforcement learning problems. The two most famous approaches to MABs, namely Upper Confidence Bounds and Thompson Sampling, share the need for some prior information about the arms’ distributions in order to attain optimal performance. We will discuss other families of algorithms based on re-sampling and in particular sub-sampling that perform well in practice and can be proved to be optimal for different families of distributions. Moreover, they can also be used when the reward maximization objective is modified to take into account some notion of risk.

_____________________________

You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar
DeLTA Lab page