DeLTA seminar by Emilie Kaufmann
Emilie Kaufmann, INRIA Lille
Non-parametric exploration in multi-armed bandits
The multi-armed bandit model is useful for sequential resource allocation tasks in a stochastic environment, e.g., the design of a recommendation algorithm or an adaptive clinical trial. This simple model also captures the exploration/exploitation dilemma that is central in more structured reinforcement learning problems. The two most famous approaches to MABs, namely Upper Confidence Bounds and Thompson Sampling, share the need for some prior information about the arms’ distributions in order to attain optimal performance. We will discuss other families of algorithms based on re-sampling and in particular sub-sampling that perform well in practice and can be proved to be optimal for different families of distributions. Moreover, they can also be used when the reward maximization objective is modified to take into account some notion of risk.
DeLTA is a research group affiliated with the Department of Computer Science at the University of Copenhagen studying diverse aspects of Machine Learning Theory and its applications, including, but not limited to Reinforcement Learning, Online Learning and Bandits, PAC-Bayesian analysis