Tutorial: Space-Efficient Index Structures in Practice

Special event on Friday 27th June.

Tutorial: Space-Efficient Index Structures in Practice

By Dr. Simon Gog, Research Fellow, Melbourne School of Engineering, University of Melbourne, Australia.

Location: KUA2 – 15A.1.11

Address: Det Humanistiske Fakultet, Karen Blixens Vej 4, Bygning 15A, 1st floor 2300 Copenhagen S

Time: Morning session: 10.15-12.00; afternoon session: 15.15-17.00; evening session (dinner): 18:00-

Index structures are fundamental to efficiently support search as well as statistical computations over large data sets. Traditional index structures supporting these operations often occupy space multiple times larger than the original data. This makes traditional indexes unfeasible for scenarios where the original data just fits into main memory as the index structure would have to reside on lower levels of the memory hierarchy.

In the last two decades, many space-efficient index structures have been proposed. These structures occupy space equivalent to size of the compressed input while still supporting operations efficiently. Moreover, they often eliminate the need to store the original data as it can be efficiently reconstructed from the index.

In this tutorial, I will present a selection of space-efficient index structures which can be applied to domains such as Bioinformatics and Information Retrieval. I will first introduce simple structures such as compressed bitvector representations which
represent the building blocks of more complex index structures such as Compressed Suffix Trees (CST) and Compressed Suffix Arrays (CSA).

Additionally we will explore Wavelet Trees, Range Minimum Query (RMQ) Structures and succinct tree representations. This tutorial will cover both the theoretical underpinnings of these structures as well as a practical implementation aspects. In particular, we will explore the usage, implementation and composability of the presented structures in the context of the Succinct Data Structure Library (SDSL). The SDSL provides a generic, highly engineered library toolbox which contains implementations of the presented data structures. I will show how to use this toolbox to construct and query existing structures and how you can tailor your own structure out of its components.

SEA 2014 Conference

Special event on Friday 27th June.

Tutorial: Space-Efficient Index Structures in Practice