School Drop Out Prediction – Københavns Universitet

School Drop Out Prediction

Master's thesis defence by Nicolæ-Bogdan Sara Wednesday, May 14 at 16:30.

Abstract

One of the biggest problems in education nowadays is pupils dropping out school. Governments loose significant amount of money for each student dropping out such that the quality of education decreases substantially. This has a negative influence to life expectancy, health, political interest and even happiness.

This is a company thesis that was done in collaboration with MaCom A/S which provides digital education services to the high schools in Denmark through Lectio. They are continuously investigating new ways for getting useful information from the current data, many research projects being developed recently. The subject of drop out prediction is of high interest for them and represents the work of my thesis.

In this work, I researched whether it is suitable to use machine learning algorithms for notifying to the interested parties which are the possible drop out students in the near future of 3 months. For the experiments, there were used only students that finished the first six months of their education, given the lack of available data for these students.

I created a data model for each pupil by following an iterative process.There were involved school inspectors, teachers, MaCom A/S employees and existing literature. Data was obtained by using company database but also from external sources. Then, several binary classification algorithms were tested on these data: Linear Discriminant Analysis, Random Forests, Support Vector Machines, Classification and Regression Trees, k-Nearest Neighbour and Naïve Bayes. For the model selection, a combination of grid search and cross validation was used, followed by evaluation on test data.

The best result for the number of correctly classified drop out students was obtained by using Naive Bayes, with an accuracy of 82.6 %. The best result for classifying both "drop out" and "not drop out" students was obtained by Random Forests classifier, with an accuracy of 93.47 %. In conclusion, machine learning could be definitely used as basis for a support tool that helps to prevent pupils dropping out the high schools.

Time: Wednesday, May 14, 16:30
Place: HCØ 01-0-029, APL-gruppens Mødelokale, Universitetsparken 5
Supervisors: Christian Igel, Stephen Alstrup
Censor: Lars Frank, CBS