# Machine Learning Theory

The amount and complexity of available data is steadily increasing. To make use of this wealth of information, computing systems are needed that turn the data into knowledge.

Machine learning is about developing the required algorithms and software systems that automatically analyse data for making predictions, categorizations, and recommendations. Machine learning algorithms are already an integral part of today's computing systems – for example in search engines, recommender systems, or biometrical applications – and have reached superhuman performance in some domains. DIKU's research pushes the boundaries and aims at more robust, more efficient, and more widely applicable machine learning techniques.

## Machine learning at DIKU

Machine learning is a branch of computer science and applied statistics covering algorithms that improve their performance at a given task based on sample data or experience. The machine learning research at DIKU, the Department of Computer Science at the University of Copenhagen, is concerned with the design and analysis of adaptive systems for pattern recognition and behaviour generation.

Our fields of expertise are

- Classification, regression, and density estimation techniques for data mining and modelling, pattern recognition, and time series prediction
- Online and reinforcement learning
- Computational intelligence methods for non-linear optimisation including vector optimisation and multi-criteria decision making
- Statistical and computational learning theory

Successful real-world applications include the design of biometric and medical image processing systems, chemical processes and plants, advanced driver assistance systems, robot controllers, time series predictors for physical processes, systems for sports analytics, acoustic signal classification systems, automatic quality control for production lines, and sequence analysis in bioinformatics.

To build efficient and autonomous machine learning systems we draw inspiration from optimisation and computing theory as well as biological information processing. We analyse our algorithms theoretically and critically evaluate them on real-world problems. Increasing the robustness and improving scalability of self-adaptive, learning computer systems are cross-cutting issues in our work. The following sections highlight some of our research activities.

## Efficient autonomous machine learning

We strive for computer systems that can deal autonomously and flexibly with our needs. They must work in scenarios that have not been fully specified and must be able to cope with unpredicted situations. Incomplete descriptions of application scenarios are inevitable because we need algorithms for domains where the designer's knowledge is not perfect, the solutions to particular problems are simply unknown, and/or the sheer complexity and variability of the task and the environment precludes a sufficiently accurate domain description. Although such systems are in general too complex to be designed manually, large amounts of data describing the task and the environment are often available or can be automatically obtained. To take proper advantage of this available information, we need to develop systems that self-adapt and automatically improve based on sample data – systems that learn.

Machine learning algorithms are already an integral part of today's computing systems, for example in internet search engines, recommender systems, or biometrical applications. Highly specialised technical solutions for restricted task domains exist that have reached superhuman performance. Despite these successes, there are fundamental challenges that must be met if we are to develop more general learning systems.

First, present adaptive systems often lack *autonomy* and *robustness*. For example, they usually require a human expert to select the training examples, the learning method and its parameters, and an appropriate representation or structure for the learning system. This dependence on expert supervision is retarding the ubiquitous deployment of adaptive software systems. We therefore work on algorithms that can handle large multimodal data sets, that actively select training patterns, and that autonomously build appropriate internal representations based on data from different sources. These representations should foster learning, generalisation, and communication. Second, current adaptive systems succumb to scalability problems.

On the one hand, the ever growing amounts of data require highly efficient large-scale learning algorithms. On the other hand, learning and generalisation from very few examples is also a challenging problem. This scenario often occurs in man-machine interaction, for example in software personalisation or when generalisation from few database queries is required. We address the scaling problems by using task-specific architectures incorporating both new concepts inspired by natural adaptive systems as well as recent methods from algorithmic engineering and mathematical programming.

## Selected methods

We address all major learning paradigms, unsupervised, supervised, and reinforcement learning. These are closely connected. For instance, unsupervised learning can be used to find appropriate representations for supervised learning and reliable supervised learning techniques are the prerequisite for successful reinforcement learning. Over the years, we used, analysed, and refined a broad spectrum of machine learning techniques. Currently our methodological research focuses on the following methods.

### Supervised learning

Support vector machines (SVMs) and other kernel-based algorithms are state-of-the-art in pattern recognition. They perform well in many applications, especially in classification tasks. The *kernel trick* allows for an easy handling of non-standard data (e.g., biological sequences, multimodal data) and permits a better mathematical analysis of the adaptive system because of the convenient structure of the hypothesis space. Developing and analysing kernel-based methods, in particular increasing autonomy and improving scalability of SVMs, is currently one of the most active branches of our research.

### Unsupervised and deep learning

Deep learning refers to machine learning architectures composed of multiple levels of non-linear transformations. The idea is to extract more and more abstract features from input data, learning more abstract representations. These representations can be learned in a supervised and/or unsupervised manner. A prominent example are *convolutional neural networks*, which we apply, for example, to medical image analysis

Furthermore, we employ probabilistic generative models to learn and to describe probability distributions. Our research focuses on Markov random fields, in which the conditional independence structure between random variables is described by an undirected graph. We are particularly interested in models that allow for learning hierarchical representations of data in an unsupervised manner. These models include *restricted Boltzmann machines*, the building blocks of *deep belief networks*.

### Online learning

Most of existing online learning algorithms operate under narrow (essentially, singular) assumptions about the nature of data generating processes, which are rarely satisfied in practice. Moreover, the nature of many data generating processes changes with time making it impossible to model them using singular assumptions. The mismatch between theoretical assumptions and reality leads to suboptimality of existing algorithms often reaching orders of magnitude.We are entering a big data age, where the Volume, Velocity, and Variety of information are growing at the fastest rates ever. An increasing number of data processing systems are switching from offline to online information processing, which allows continuous monitoring, analysis, and instantaneous reaction to new information. *Online learning* is a branch of machine learning providing theoretical foundations and intelligence for online data processing systems.

We are developing a systematic approach to studying problem ranges rather than narrowly defined isolated problems, we focus on *The Space of Online Learning Problems* (see figure). For example, independently identically distributed (i.i.d.) and adversarial environments were studied separately for decades not only in online learning, but also in the closely related online algorithms and game theoretic communities. However, it was recently shown (including one our work) that in some simple scenarios there exist algorithms that are almost optimal for both kinds of processes, as well as for a whole range of intermediate regimes. We are extending and systematizing this kind of results to much broader and complex problem ranges and complementing them with lower bounds quantifying the cost of relaxed assumptions and outlining the limits of our approach. We name the project *a new generation of online learning algorithms*, since studies of problem ranges raise the field of online learning to an absolutely new level, both in terms of theoretical challenges and practical performance.

For further information about this project check The Space of Online Learning Problems tutorial.

### Non-linear optimisation

Learning is closely linked to optimisation. Thus, we are also working on general gradient-based and direct search and optimisation algorithms. This includes randomised methods, especially evolutionary algorithms (EAs), which are inspired by neo-Darwinian evolution theory. Efficient evolutionary optimisation can be achieved by an automatic adjustment of the search strategy. We are developing EAs with this ability, especially real-valued EAs that learn the metric underlying the problem at hand (e.g., dependencies between variables). Currently, we are working on variable-metric EAs for RL and for efficient vector (multi-objective) optimisation. The latter will become increasingly relevant for industrial and scientific applications in the future, because many problems are inherently multi-objective.

### Reinforcement learning

The feedback in today's most challenging applications for adaptive systems is sparse, unspecific, and/or delayed, for instance in autonomous robotics or in man-machine interaction. Supervised learning cannot be used directly in such a case, but the task can be cast into a reinforcement learning (RL) problem. Reinforcement learning is learning from the consequences of interactions with an environment without being explicitly taught. Because the performance of standard RL techniques is falling short of expectations, we are developing new RL algorithms relying on gradient-based and evolutionary direct policy search.

Please click here for a full list of Yevgeny's papers.

- Niklas Thiemann, Christian Igel, Olivier Wintenberger, and Yevgeny Seldin. A Strongly Quasiconvex PAC-Bayesian Bound. In
*Algorithmic Learning Theory (ALT)*, 2017 - Oswin Krause, Dídac R. Arbonès, and Christian Igel. CMA-ES with Optimal Covariance Update and Storage Complexity. In
*Advances in Neural Information Processing Systems (NIPS)*, 2016 - Ürün Dogan, Tobias Glasmachers, and Christian Igel. A Unified View on Multi-class Support Vector Classification.
*Journal of Machine Learning Research*17(45), 2016 - Fabian Gieseke, Justin Heinermann, Cosmin Oancea, and Christian Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs.
*JMLR W&CP*32 (ICML), 2014 - Yevgeny Seldin, Peter L. Bartlett, Koby Crammer, and Yasin Abbasi-Yadkori. Prediction with limited advice and multiarmed bandits with paid observations. In
*JMLR W&CP*, 32 (ICML), 2014 - Yevgeny Seldin and Aleksandrs Slivkins. One practical algorithm for both stochastic and adversarial bandits. In
*JMLR W&CP*, 32 (ICML), 2014 - Kai Brügge, Asja Fischer, and Christian Igel. The flip-the-state transition operator for restricted Boltzmann machines.
*Machine Learning*13, pp. 53-69, 2013 - Fabian Gieseke, Christian Igel, and Tapio Pahikkala. Polynomial runtime bounds for fixed-rank unsupervised least-squares classification.
*JMLR W&CP*29*(ACML)*, pp. 62-71, 2013 - Oswin Krause, Asja Fischer, Tobias Glasmachers, and Christian Igel. Approximation properties of DBNs with binary hidden units and real-valued visible units.
*JMLR W&CP*28*(ICML)*, pp. 419–426, 2013 - Ilya Tolstikhin and Yevgeny Seldin. PAC-Bayes-Empirical-Bernstein Inequality. In
*Advances in Neural Information Processing Systems (NIPS)*, 2013 - Kim Steenstrup Pedersen, Kristoffer Stensbo-Smidt, Andrew Zirm, and Christian Igel. Shape Index Descriptors Applied to Texture-Based Galaxy Analysis.
*International Conference on Computer Vision (ICCV)*, pp 2440-2447, IEEE Press, 2013 - Yevgeny Seldin, François Laviolette, Nicolò Cesa-Bianchi, John Shawe-Taylor, and Peter Auer. PAC-Bayesian inequalities for martingales.
*IEEE Transactions on Information Theory*, 58(12), pp. 7086-7093, 2012 - Asja Fischer and Christian Igel. Bounding the Bias of Contrastive Divergence Learning.
*Neural Computation*23, pp. 664-673, 2011 - Yevgeny Seldin, Peter Auer, François Laviolette, John Shawe-Taylor, and Ronald Ortner. PAC-Bayesian analysis of contextual bandits. In
*Advances in Neural Information Processing Systems (NIPS)*, 2011 - Tobias Glasmachers and Christian Igel. Maximum Likelihood Model Selection for 1-Norm Soft Margin SVMs with Multiple Parameters.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*32(8), pp. 1522-1528, 2010 source code - Yevgeny Seldin and Naftali Tishby. PAC-Bayesian analysis of co-clustering and beyond.
*Journal of Machine Learning Research*11, pp. 3595−3646, 2010 - Thorsten Suttorp, Nikolaus Hansen, and Christian Igel. Efficient Covariance Matrix Update for Variable Metric Evolution Strategies.
*Machine Learning*75, pp. 167-197, 2009 source code - Verena Heidrich-Meisner and Christian Igel. Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search. In L. Bottou and M. Littman, eds.:
*Proceedings of the International Conference on Machine Learning (ICML 2009)*, pp. 401-408, 2009 - Christian Igel, Verena Heidrich-Meisner, and Tobias Glasmachers. Shark.
*Journal of Machine Learning Research*9, pp. 993-996, 2008 source code - Tobias Glasmachers and Christian Igel. Maximum-Gain Working Set Selection for SVMs.
*Journal of Machine Learning Research*7, pp. 1437-1466, 2006 source code

## People

Name | Title | Phone | |
---|---|---|---|

Gieseke, Fabian Christian | Associate Professor | ||

Igel, Christian | Professor | +4535335674 | |

Krause, Oswin | Associate Professor | ||

Seldin, Yevgeny | Professor | +4530450082 | |

Talebi, Sadegh | Assistant Professor - Tenure Track | +4531539770 |

## Contact

**Christian Igel**

Professor

igel@di.ku.dk