Learning from Educational Data: Improving Methods and Theoretical Guarantees for Data Mining

Research output: Book/ReportPh.D. thesis

  • Stephan Sloth Lorenzen
This thesis summarizes my PhD project. The structure of the thesis, and my project, follows the philosophy behind the Danish Center for Big Data Analytics driven Innovation (DABAI); in collaboration with companies, we develop solutions for educational data mining. Taking inspiration from the challenges faced, we define and investigate research problems within the areas of algorithms and machine learning.

During my project, I have worked with the Danish companies Clio and MaCom. With Clio, the main objective has been to provide teacher insight about students in primary school. We do so through performance prediction in an online quiz system and by analyzing behavioral patterns observed in log data, in order to determine optimal study behavior. With MaCom, we investigate methods for detecting ghostwriters in high school; external authors hired by students to write their essays. We extend this work to an analysis tool for analyzing and tracking writing style changes for high school students, providing insights for teachers.

Based on the problems faced while working with Clio, we develop novel techniques for improving budgeted maximum inner product search, an important algorithmic ingredient in many data mining methods.

Furthermore, we investigate theoretical bounds for majority vote classifiers, providing theoretical guarantees for the random forest classifier. While these bounds are often still too loose for practical uses, the area of research is important, as highlighted by our work with MaCom.


Finally, the thesis concludes with an overview of the company collaboration and a discussion of the challenges faced during the collaboration.​​
Original languageEnglish
PublisherDepartment of Computer Science, Faculty of Science, University of Copenhagen
Publication statusPublished - 2019

ID: 248811818