Exploiting Easiness and Overcoming Delays in Online Learning

Research output: Book/ReportPh.D. thesisResearch

  • Tobias Sommer Thune
In machine learning we work towards building algorithms that can solve complex tasks by learning how to solve them, rather than knowing how to solve them by design. Online learning is the subfield focusing on simultaneous execution and learning — that is learning while a task is “live” or online. Imagine a medical trial, where we want to identify the best drug for some illness. Instead of setting aside a portion of patients for testing, we might be able to cure more people by considering all patients as an online task and optimise the total number we cure. An algorithm in this scenarios must balance on one hand being adventurous and exploring the options in order to sufficiently gather knowledge of the task, with choosing what seems to be the best option in order to be performant on the other.

Using the theoretical framework of “multi-armed bandits”, we explore two variations of online learning scenarios:
We construct an algorithm capable of performing better if the task has a certain structure making it easier. This is possible for two kinds of structure simultaneously without having knowledge about the setting, and while remaining robust to harder settings.

Secondly we explore how to deal with the feedback from the algorithm’ s actions being delayed. We expand prior approaches to the case where the delay might vary in time. Here we develop a new technique of skipping feedback if it is excessively delayed and prove a conjecture of the potential performance for this algorithm. In addition we show that in such problems our algorithms perform much better than what was previously thought possible, and design examples of tasks where this is the case.
Original languageEnglish
PublisherDepartment of Computer Science, Faculty of Science, University of Copenhagen
Publication statusPublished - 2020

ID: 244237213