Application of Machine Learning Algorithms to Credit Scoring

MSc thesis defense by Kasper Nybo Hansen


Credit scoring can be used to model and understand the risk associated with issuing loans. Credit scoring is widely used in the financial world.

In this thesis I have investigated the possibility of Danske Bank using machine learning algorithms to perform credit scoring. Using machine learning algorithms has the advantage that a minimum amount of human involvement is necessary, which reduces the financial costs and allows for accurate, unbiased decisions.

Using a real world dataset, supplied by Danske Bank, several well known supervised binary classification algorithms are investigated. The dataset contains a vast amount of missing data entries, and the missing data problem is also investigated.

A testing framework has been developed. The framework uses 10-fold cross-validation to measure the accuracy of the classifiers. Nested cross-validation and standard grid search methods are used to perform model selection.

A modified version of the random forest algorithm has been developed. The modified version of the random forest algorithm can handle data containing missing data entries without the need for pre-processing such as imputation.

Supervisor: Christian Igel (DIKU)

Censor: Lars Frank (CBS)