MSc Thesis Defense by Johs Kristoffersen


Both High schools and universities alike face the issue of student retention. A key part in keeping the retention is doing targeted preventive action. This thesis explores the possibility of identifying potential dropouts using machine learning techniques. In collaboration with Macom A/S, using their High school eLearning system Lectio, four separate education-level datasets were extracted representing the first half year, the second year and the rest of the education. These datasets build upon the work done by S ̧ara’s thesis in 2014 [18] synthesized into an article [17]. Inspiration for new features were found in the literature and through an interview. Among these some were extracted from a social network build for each school.

In collaboration with Department of Computer Science at Copenhagen University, using their eLearning system Absalon, two course-level datasets were created containing mainly the performance of students in first year courses. The first dataset contains the full infor- mation for the whole course, while the second only uses data from the first half. Using re-sampling techniques and classification algorithms like Naive Bayes, Support Vector Machine, Classification And Regression Trees and variants of Random Forest it was possible to identify 77-79.5 % of the dropouts in High school, except for the first half year where 67.35 % were identified. In the case of the University students 77.88 % of the dropouts could be predicted halfway through the course, and 90.22 % using information about all the assignments in the course. An interesting observation was the bias Random Forest has towards the majority class, especially in extremely imbalanced datasets, which leads to very low True Positive Rates.

In conclusion, machine learning can be used as a supporting tool to identify potential dropouts.

Supervisor: Christian Igel, Stephen Alstrup
Censor:, ITU