# Tre individuelle Marie Sklodowska Curie Fellowship legater til DIKU

Det Natur- og Biovidenskabelige Fakultet på KU er blevet tildelt næsten halvdelen af alle individuelle Marie Sklodowska Curie Fellowship legater i Danmark i januar 2017, da EU-kommissionen inviterede 27 af SCIENCE’s ansøgere til kontraktforhandlinger. Tre af legaterne er tildelt DIKU.

De tre projekter er:

## Provably-Correct Efficient Algorithms for Clustering

*Researcher: Vincent Viallat Cohen-Addad*

Clustering data according to similarity is ubiquitous in computer and data sciences. Similarity between data is often modeled by a distance function: two data points are close if they are similar. This induces a metric space in which each data point is associated to a point of the space. Thus, a clustering according to similarity is a partition of the points such that the distance between two points in the same part is small. Therefore, clustering problems play a crucial role in extracting information from massive datasets in various research areas. However, this problem is hard to formalise: the soundness of a particular clustering often depends on the structure of the data. This induces a gap between theory and practice: on the one hand no guarantee on the practical algorithms can be proven, on the other hand the best theoretical algorithms turn out to be noncompetitive in practice.

By focusing on both the algorithms and inputs that are relevant in practice, the PEAC project aims at rigorously analysing the cutting-edge heuristics and designing more efficient algorithms that are provably-correct for both clustering and hierarchical clustering (HC), bridging a gap between theory and practice.

Very recently, it was shown that a widely-used local search (LS) algorithm achieves the best approximation guarantees for some specific inputs. We plan to design a faster LS-based algorithm for those types of inputs to achieve both better running time and approximation guarantees than the best heuristics. We will design a non-oblivious LS algorithm to obtain a better than the current 2.675 approximation for k-median.

Dasgupta recently introduced a cost function for HC. Using this cost function, we plan to analyse the performances of widelyused heuristics for HC (e.g.: average-linkage, bisection k-means). We will characterize the real-world inputs and use the cost function to design more efficient provably-correct algorithms for HC.

**Monotonicity in Logic and Complexity**

*Researcher: Anupam Das*

MiLC will develop logical characterisations of monotone complexity classes, yielding languages and systems which are´machine-independent and well suited for reasoning over such classes of functions. Monotone Boolean functions abound in the theory of computation, e.g. in sorting algorithms and clique detection in graphs, and nonuniform classes of monotone functions have been well studied in computational complexity under the lens of monotone circuits.

From the point of view of computation, monotone functions are computed by algorithms not using negation, and this will lead to several recursion-theoretic characterisations of feasible classes such as monotone P, NCi, ACi and the polynomial hierarchy. The main purpose of MiLC will be to capture these classes proof theoretically, by calibrating each class with the formally representable functions of a certain theory. MiLC will work in the setting of Bounded Arithmetic since its techniques are well suited to handling monotonicity, building on recently discovered correspondences with monotone proof complexity. To this end two avenues for controlling monotonicity will be investigated: (a) restricting negation in proofs, inducing monotone witnessing invariants, and (b) restricting structural rules of the underlying logic to eliminate the nonmonotone cases of witness extraction. The aim is to arrive at modular characterisations, where monotonicity of a represented class is switched on or off by the inclusion or exclusion, respectively, of certain structural rules.

Finally MiLC will calibrate these theories with well studied systems in proof complexity, namely monotone, intuitionistic and deep inference systems, under the usual correspondence between theories of Bounded Arithmetic and systems of propositional logic. These tight correspondences ensure that the tools developed in MiLC may be employed to attack certain open problems in the area, reformulating and improving existing bounds.

## VIsual representations of VIew Relations to support effective data analysis on large and highresolution displays (VIVIR)

*Researcher: Søren Knu**dsen*

In VIVIR, I will conduct state-of-the-art research on supporting collaborative face-to-face data analysis. This is motivated by the increasing need for interdisciplinary teams to collaborate on understanding and analysing data. Additionally, as the scale and complexity of data increase, so does the demand for data-based insights and decision-making. My approach is to empower people who are working with large and complex data, by letting them lay out as many visualization views (in the following, denoted views) as necessary on large displays, and creating specialized meta-visualizations to show relations between these views. These meta-visualizations will allow team workers to be aware of each other’s work and the changing view- and data-relationships as they work. While the potential of view meta-visualizations has been acknowledged , there are currently only a few frequently used and considered essential examples of such meta-visualizations. These might show that data in two views are compared in a third view, or that a view shows a subset of the data shown in another view. Most importantly, there has been no thorough exploration into the power and potential of meta-visualization support for datadriven decision-making. To understand the potential impact of meta-visualizations on data analysis, we need to take a structured approach, to formalize these possibilities, which will improve our abilities to support knowledge worker teams as they face the challenges of analyzing increasingly complex data.

In brief, data can be difficult to understand. Creating visualizations of data lets people see their data more clearly. As data size and complexity increases, more views are needed to reveal the information hidden in data. Large displays might be useful to solve this. However, a new problem is emerging – how to be aware of the data relationships, and keep an overview of analysis provenance , findings, and decisions between these multiple views. VIVIR tackles this issue.