DeLTA seminar by Emilie Kaufmann
Speaker
Emilie Kaufmann, INRIA Lille
Title
Non-parametric exploration in multi-armed bandits
Abstract
The multi-armed bandit model is useful for sequential resource allocation tasks in a stochastic environment, e.g., the design of a recommendation algorithm or an adaptive clinical trial. This simple model also captures the exploration/exploitation dilemma that is central in more structured reinforcement learning problems. The two most famous approaches to MABs, namely Upper Confidence Bounds and Thompson Sampling, share the need for some prior information about the arms’ distributions in order to attain optimal performance. We will discuss other families of algorithms based on re-sampling and in particular sub-sampling that perform well in practice and can be proved to be optimal for different families of distributions. Moreover, they can also be used when the reward maximization objective is modified to take into account some notion of risk.
_____________________________
You can subscribe to the DeLTA Seminar mailing list by sending an empty email to delta-seminar-join@list.ku.dk.
Online calendar
DeLTA Lab page
DeLTA is a research group affiliated with the Department of Computer Science at the University of Copenhagen studying diverse aspects of Machine Learning Theory and its applications, including, but not limited to Reinforcement Learning, Online Learning and Bandits, PAC-Bayesian analysis