Active and Adaptive Learning from Biased Data with Applications in Astronomy

Research output: Book/ReportPh.D. thesisResearch

Standard

Active and Adaptive Learning from Biased Data with Applications in Astronomy. / Kremer, Jan.

Department of Computer Science, Faculty of Science, University of Copenhagen, 2016. 91 p.

Research output: Book/ReportPh.D. thesisResearch

Harvard

Kremer, J 2016, Active and Adaptive Learning from Biased Data with Applications in Astronomy. Department of Computer Science, Faculty of Science, University of Copenhagen. <https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122203254805763>

APA

Kremer, J. (2016). Active and Adaptive Learning from Biased Data with Applications in Astronomy. Department of Computer Science, Faculty of Science, University of Copenhagen. https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122203254805763

Vancouver

Kremer J. Active and Adaptive Learning from Biased Data with Applications in Astronomy. Department of Computer Science, Faculty of Science, University of Copenhagen, 2016. 91 p.

Author

Kremer, Jan. / Active and Adaptive Learning from Biased Data with Applications in Astronomy. Department of Computer Science, Faculty of Science, University of Copenhagen, 2016. 91 p.

Bibtex

@phdthesis{19a918b73aa04eb088ba83bb175be56a,
title = "Active and Adaptive Learning from Biased Data with Applications in Astronomy",
abstract = "This thesis addresses the problem of machine learning from biased datasets in the context of astronomical applications. In astronomy there are many cases in which the training sample does not follow the true distribution. The thesis examines different types of biases and proposes algorithms to handle them.During learning and when applying the predictive model, active learning enables algorithms to select training examples from a pool of unlabeled data and to request the labels. This allows for selecting examples that maximize the algorithm's accuracy despite an initial bias in the training set. Against this background, the thesis begins with a survey of active learning algorithms for the support vector machine.If the cost of additional labeling is prohibitive, unlabeled data can often be utilized instead and the sample selection bias can be overcome through domain adaptation, that is, minimizing the discrepancy between training sample and the true distribution. A simple method consists of weighting the elements of the training sample such that the empirical risk becomes an unbiased estimator of the true distribution's risk. The respective weights can be computed as the probability density ratio of training and test distribution. A model selection criterion—which is known in the context of kernel-based weight estimators—is proposed to be combined with a nearest neighbor density ratio estimator. It is shown to compare favorably to alternative approaches when applied to large-scale problems with low-dimensional feature spaces: a common setting in astronomical applications such as photometric redshift estimation.Another form of bias stems from label noise. This thesis considers the scenario in which unreliable labels can be replaced by highly accurate labels at a certain cost. This is, for example, the case in crowd-sourcing, where unreliable labelers can be corrected by experts, or in astronomy, where a labeling based on photometric data can be improved by spectroscopic observations. An algorithm to actively select objects for correction under a limited re-labeling budget is presented. It is shown empirically to converge faster to the maximally attainable accuracy than the state-of-the-art.",
author = "Jan Kremer",
year = "2016",
language = "English",
publisher = "Department of Computer Science, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Active and Adaptive Learning from Biased Data with Applications in Astronomy

AU - Kremer, Jan

PY - 2016

Y1 - 2016

N2 - This thesis addresses the problem of machine learning from biased datasets in the context of astronomical applications. In astronomy there are many cases in which the training sample does not follow the true distribution. The thesis examines different types of biases and proposes algorithms to handle them.During learning and when applying the predictive model, active learning enables algorithms to select training examples from a pool of unlabeled data and to request the labels. This allows for selecting examples that maximize the algorithm's accuracy despite an initial bias in the training set. Against this background, the thesis begins with a survey of active learning algorithms for the support vector machine.If the cost of additional labeling is prohibitive, unlabeled data can often be utilized instead and the sample selection bias can be overcome through domain adaptation, that is, minimizing the discrepancy between training sample and the true distribution. A simple method consists of weighting the elements of the training sample such that the empirical risk becomes an unbiased estimator of the true distribution's risk. The respective weights can be computed as the probability density ratio of training and test distribution. A model selection criterion—which is known in the context of kernel-based weight estimators—is proposed to be combined with a nearest neighbor density ratio estimator. It is shown to compare favorably to alternative approaches when applied to large-scale problems with low-dimensional feature spaces: a common setting in astronomical applications such as photometric redshift estimation.Another form of bias stems from label noise. This thesis considers the scenario in which unreliable labels can be replaced by highly accurate labels at a certain cost. This is, for example, the case in crowd-sourcing, where unreliable labelers can be corrected by experts, or in astronomy, where a labeling based on photometric data can be improved by spectroscopic observations. An algorithm to actively select objects for correction under a limited re-labeling budget is presented. It is shown empirically to converge faster to the maximally attainable accuracy than the state-of-the-art.

AB - This thesis addresses the problem of machine learning from biased datasets in the context of astronomical applications. In astronomy there are many cases in which the training sample does not follow the true distribution. The thesis examines different types of biases and proposes algorithms to handle them.During learning and when applying the predictive model, active learning enables algorithms to select training examples from a pool of unlabeled data and to request the labels. This allows for selecting examples that maximize the algorithm's accuracy despite an initial bias in the training set. Against this background, the thesis begins with a survey of active learning algorithms for the support vector machine.If the cost of additional labeling is prohibitive, unlabeled data can often be utilized instead and the sample selection bias can be overcome through domain adaptation, that is, minimizing the discrepancy between training sample and the true distribution. A simple method consists of weighting the elements of the training sample such that the empirical risk becomes an unbiased estimator of the true distribution's risk. The respective weights can be computed as the probability density ratio of training and test distribution. A model selection criterion—which is known in the context of kernel-based weight estimators—is proposed to be combined with a nearest neighbor density ratio estimator. It is shown to compare favorably to alternative approaches when applied to large-scale problems with low-dimensional feature spaces: a common setting in astronomical applications such as photometric redshift estimation.Another form of bias stems from label noise. This thesis considers the scenario in which unreliable labels can be replaced by highly accurate labels at a certain cost. This is, for example, the case in crowd-sourcing, where unreliable labelers can be corrected by experts, or in astronomy, where a labeling based on photometric data can be improved by spectroscopic observations. An algorithm to actively select objects for correction under a limited re-labeling budget is presented. It is shown empirically to converge faster to the maximally attainable accuracy than the state-of-the-art.

UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122203254805763

M3 - Ph.D. thesis

BT - Active and Adaptive Learning from Biased Data with Applications in Astronomy

PB - Department of Computer Science, Faculty of Science, University of Copenhagen

ER -

ID: 165437436