Application of Machine Learning Algorithms to Credit Scoring

MSc thesis defense by Kasper Nybo Hansen

Abstract:

Credit scoring can be used to model and understand the risk associated with issuing loans. Credit scoring is widely used in the financial world.

In this thesis I have investigated the possibility of Danske Bank using machine learning algorithms to perform credit scoring. Using machine learning algorithms has the advantage that a minimum amount of human involvement is necessary, which reduces the financial costs and allows for accurate, unbiased decisions.

Using a real world dataset, supplied by Danske Bank, several well known supervised binary classification algorithms are investigated. The dataset contains a vast amount of missing data entries, and the missing data problem is also investigated.

A testing framework has been developed. The framework uses 10-fold cross-validation to measure the accuracy of the classifiers. Nested cross-validation and standard grid search methods are used to perform model selection.

A modified version of the random forest algorithm has been developed. The modified version of the random forest algorithm can handle data containing missing data entries without the need for pre-processing such as imputation.

Supervisor: Christian Igel (DIKU)

Censor: Lars Frank (CBS)