PhD defence by Chloé Alice Florence Rouyer
Title
Theoretical Foundations of Learning with Worst-Case and Easy Data
Abstract
Most machine learning models are built using an explore-then-exploit strategy. Different configurations of the model are explored during a training phase until one is selected and then deployed to be exploited on new data. This approach has a major downside: guarantees on the performance of the model only hold as long as the data used for the training phase follows the same distribution as the data used later on. In practice, this assumption may be unrealistic.
We study a different framework where exploration and exploitation of the model are not separated. In this setting, a learner has to repeatedly choose an action in a list, obtain the reward associated with that action and observe some feedback to refine her future choices. As this feedback may depend on the action chosen by the learner, the best action to play in each round depends on a trade-off between exploration and exploitation: Should the learner play the action that has the highest expected reward based on the currently available data, or should she play an action that has not been sufficiently played in the past in order to gather more information? Considering this trade-off at each round allows the learner to adapt to more challenging environments. Originally, this framework was studied by making the strong assumption that the environment provides easy data. The learner can take advantage of this structure to obtain fast learning guarantees. The opposite scenario where the learner forgoes any assumption on the data and runs as if the environment will provide worst-case data has also been studied and led to the development of robust but slower learning algorithms. Recently the goal has been to derive algorithms that adapt to both worst-case and easy data simultaneously.
In this work, we consider several online learning problems. In each case, the learner is faced with different constraints on her behavior and on the feedback she is allowed to observe, which affect the trade-off between exploration and exploitation in different ways. We propose algorithms for each of those problems and analyze them to provide theoretical guarantees against both worst-case and easy data.
Supervisors
Professor Yevgeny Seldin, Department of Computer Science, UCPH
Assessment Committee
Professor Christian Igel, Department of Computer Science, UCPH
Professor Gábor Lugosi, Pompeu Fabra University
CNRS Researcher Emilie Kaufmann, INRIA Lille
Moderator of defence: Assistant Professor Sadegh Talebi, Department of Computer Science, UCPH
For an electronic copy of the thesis, please visit the PhD Programme page.