Surveying the sky using machine learning (SkyML)
The SkyML project has ended, see the publication list below for our achievements in the 3-year project period. The review article
Jan Kremer, Kristoffer Stensbo-Smidt, Fabian Gieseke, Kim Steenstrup Pedersen, and Christin Igel. Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy. IEEE Intelligent Systems 32(2), pp. 16-22, 2017.
summarizes some of our findings.
Original project description
Astrophysics and cosmology are rich with data. The advent of wide-area digital cameras on large aperture telescopes has led to ever more ambitious surveys of the sky. The data volume of an entire survey of a decade ago can now be acquired in a single night and real-time analysis is often desired.
Our goal is to advance astrophysics research by developing efficient and specialized machine learning (ML) and image analysis techniques for these large-scale survey data. We will work on the wealth of data already available and will prepare for planned missions such as Gaia, the Large Synoptic Survey Telescope (LSST), and the Euclid satellite. These missions will collect an even larger data volume consisting of hundreds of data points for each of more than a billion objects. The magnitude of these surveys makes manual examination impossible. Advanced ML systems can solve this problem by automating the analysis. They are able to uncover the relation between input data (e.g., galaxy images) and outputs (e.g., galaxy physical properties) based on input-output samples. However, there are no ready-made solutions; data analysis in astronomy and cosmology poses scientific challenges to ML research and we will develop novel algorithms to address them. Because of the large amounts of data and time constraints when observing time-variable targets, we need highly efficient methods. Furthermore, the learning algorithms must cope with theoretical and practical problems due to sample selection bias: In astronomy the distributions of training and testing data (the data for building and applying models respectively) are often substantially different. This mismatch is due to only having training sets from old surveys while upcoming missions will probe never-before-seen regions in the astrophysical parameter space. Such systematic differences between samples in the training and testing data have to be addressed by the learning system.
We will consider both transient event detection as well as galaxy classification. Transient events are unpredictable, short-lived changes (lasting between microseconds and weeks) in astrophysical objects, for instance a supernova or variable star. We strive for new methods for detecting them more reliably and quickly, enabling time-critical follow-up observations.
Understanding galaxies and their evolution has been a prime concern for astrophysicists since the time of Edwin Hubble. Large surveys now collect images for millions of galaxies. We will develop image and ML techniques to improve the classification of galaxy morphology and other physical parameters. Methodologically, we will focus on (multi-class) support vector machines (SVMs), which are well understood theoretically and provide excellent classification performance. To apply them to large-scale survey data, we will develop efficient online learning algorithms for consistent multi-class SVMs. Both the learning and the evaluation will be scaled up by exploiting multi-core hardware architectures. We will develop methods to tame sample selection bias for SVMs based on cost-sensitive learning and new variants of active learning, which has proven to increase accuracy of photometric variable star classification. For image analysis, we will employ tailored local image features capturing both image structure and texture. The aspired methodological improvements are driven by our applications, but shall advance the field of ML in general. Data for the project is available in the form of existing surveys, however, our long term goal is to prepare for upcoming missions, in particular Euclid.
Core team
Peer-reviewed Publications
Popular Science Publications
MSc Projects and Theses
Jens Patrick Raaby. An objective categorization of auroral substorms - Exploring large scale morphology. MSc. in Computer Science, University of Copenhagen, 2014
Contact
For information reagrding SkyML please contact Christian Igel or Kim Steenstrup Pedersen.
Funding
The project is funded by the Danish Council for Independent Research, Natural Sciences.