Declarative and Scalable Selection for Map Visualizations

PhD-defense by Pimin Konstantin Kefaloukos

Geographic maps are among the oldest and most important data visualizations known to mankind. Moreover, their design and use has been studied for several millennia, and their subordinate processes -- such as map generalization -- are well understood. However, the web has impacted the field of mapping in a number of ways: (1) Ready-to-use background maps have become a commodity that is easily integrated into digital mapping applications. In contrast, thematic foreground layers offer great variability and cannot been commoditized to the same extent. Therefore, research that studies the creation of thematic foreground layers is merited. (2) The typical map making professional has changed from a GIS specialist to a busy person with map making as a secondary skill. Today, thematic maps are produced by journalists, aid workers, amateur data enthusiasts, and scientists alike. Therefore it is crucial that this diverse group of map makers is provided with easy-to-use and expressible thematic map design tools. Such tools should support customized selection of data for maps in scenarios where developer time is a scarce resource. (3) The Web provides access to massive data repositories for thematic maps and is itself a source and cause of prolific data creation. This calls for scalable map processing techniques that can handle the data volume and which play well with the predominant data models on the Web. (4) Maps are now consumed around the clock by a global audience. While historical maps were single-user or few-user interfaces, today's map interfaces must be designed for millions of people who concurrently consume maps anywhere and anytime.

To address the challenges of efficient point map design, scalable map processing, and scalable map distribution, this thesis proposes novel techniques in the area of database-supported maps: (a) Glossy SQL, and its prototype CVL, belong to a new class of concise declarative languages that allows map designers to specify holistic multi-scale selection of geographic information for maps. Glossy SQL supports biased random sampling of data subject to two types of user-defined constraints as well as custom objectives. The purpose of the language is to derive a target multi-scale database from a source database according to holistic specifications. (b) The Glossy SQL compiler allows Glossy SQL to be scalably executed in a spatial analytics system, such as a spatial relational database system. This technique allows unprecedented data volumes to be processed for maps. Scalable execution is achieved by translating Glossy SQL queries into pure relational algebra queries that can run natively in SQL-based spatial analytics systems. The implementation developed during this thesis supports the PostgreSQL dialect of SQL. The prototype implementation is a compiler that translates CVL into SQL and stored procedures. (c) TileHeat is a framework and basic algorithm for partial materialization of hot tile sets for scalable map distribution. The framework predicts future map workloads based on an access log of recent requests.

The results show that Glossy SQL og CVL can be used to compute cartographic selection by processing one or more complex queries in a relational database. The scalability of the approach has been verified up to half a million objects in the database. Furthermore, there are indications that the method is scalable for databases that contain millions of records, especially if the target language of the compiler is substituted by a cluster-ready variant of SQL. While several realistic use cases for maps have been implemented in CVL, additional non-geographic data visualization uses cases have been implemented for Glossy SQL. For example, it is shown that Glossy can express selection for multi-scale one-dimensional data visualizations such as timelines. Finally, it is shown that multi-scale selection can be concisely expressed in Glossy SQL and CVL, which was an important design goal. The results for Tileheat show that the prediction method offers a substantial improvement over the current method used by the Danish Geodata Agency. Thus, a large amount of computations can potentially be saved by this public institution, which is responsible for the distribution of government-curated geographic data in Denmark.

Assessment Committee:

Chairman: Associate Professor Erik Frøkjær, Department of Computer Science, University of Copenhagen

Member 1: Associate Professor Lars Harrie, Lund University

Member 2: Associate Professor Jørgen Villadsen, Technical University of Copenhagen

Academic advisors: Professor Martin Zachariasen and Assistant Professor Marcos Vaz Salles, Department of Computer Science, University of Copenhagen 

For an electronic copy of the thesis, please contact Jette Giovanni Møller,