DeLTA seminar by Emilie Kaufmann

Delta logo

Zoom link


Emilie Kaufmann, INRIA Lille

Portrait of Emilie Kaufmann


Non-parametric exploration in multi-armed bandits


The multi-armed bandit model is useful for sequential resource allocation tasks in a stochastic environment, e.g., the design of a recommendation algorithm or an adaptive clinical trial. This simple model also captures the exploration/exploitation dilemma that is central in more structured reinforcement learning problems. The two most famous approaches to MABs, namely Upper Confidence Bounds and Thompson Sampling, share the need for some prior information about the arms’ distributions in order to attain optimal performance. We will discuss other families of algorithms based on re-sampling and in particular sub-sampling that perform well in practice and can be proved to be optimal for different families of distributions. Moreover, they can also be used when the reward maximization objective is modified to take into account some notion of risk.


You can subscribe to the DeLTA Seminar mailing list by sending an empty email to
Online calendar
DeLTA Lab page