Talk: Tutorial - The Space of Online Learning Problems – University of Copenhagen

Talk: Tutorial - The Space of Online Learning Problems

Talk by Yevgeny Seldin, assistant professor in the Image Section. He will introduce the space of online learning problems and describe a number of new algorithms that solve ranges of problems in the space of online learning problems.

Abstract:

Nowadays the rate and volume of information flow are sharply increasing and forcing numerous domains to switch from offline to online information processing. Online learning is a subfield of machine learning providing the intelligence behind many online data processing systems. It has already revolutionized personalization and advertising on the Internet and rapidly penetrates many other domains.

We will introduce the space of online learning problems. The space is spanned by three axes corresponding to three major parameters of online learning problems:

  1. The feedback axis, measuring the amount of feedback obtained by the algorithm at every round. We will primarily focus on full information and limited (a.k.a. bandit) feedback.
  2. The environmental resistance axis, where we primarily distinguish between i.i.d. (a.k.a. stochastic) and adversarial environments.
  3. The structural complexity axis, measuring the structural complexity of a problem. We will primarily focus on stateless problems and problems with side information (state).

We will review a number of classical online learning problems - prediction with expert advice, adversarial multiarmed bandits, stochastic multiarmed bandits, and bandits with side information (Cesa-Bianchi and Lugosi, 2006, Bubeck and Cesa-Bianchi, 2012).

Most of the classical results in online learning correspond to isolated points in the space of online learning problems. In the second part of the tutorial we will describe a number of new algorithms that solve ranges of problems in the space of online learning problems. The results include algorithms that interpolate between full information and bandit feedback (Seldin et al., 2014, Kale, 2014); algorithms that interpolate between adversarial and i.i.d. (or in some other sense “sub-adversarial”) environments in the full information (Gaillard et al., 2014, Koolen et al., 2014, Wintenberger, 2015, Koolen & van Erven, 2015, Luo & Schapire, 2015) and in the bandit setting (Bubeck and Slivkins, 2012, Seldin and Slivkins, 2014); and algorithms that interpolate between stateless bandits and bandits with side information (Seldin et al.,2011). These results open a new era in online learning research, where the researchers progress from studyi ng isolated points in the space of online learning problems to studying ranges of problems.

** This will be a practice of a tutorial that I am giving at ECML-PKDD. It will take 3 hours with 15-minutes break in between.