On predicting student performance using low-rank matrix factorization techniques

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Predicting the score of a student is one of the important problems in educational data mining. The scores given by an individual student reflect how a student understands and applies the knowledge conveyed in class. A reliable performance prediction enables teachers to identify weak students that require remedial support, generate adaptive hints, and improve the learning of students. This work focuses on predicting the score of students in the quiz system of the Clio Online learning platform, the largest Danish supplier of online learning materials, covering 90% of Danish elementary schools and hundred of thousands of students. In particular, we formalize our prediction task as the weighted low-rank matrix factorization (LRMF) problem, a very attractive problem in machine learning community due to its extensive applications in collaborative filtering. We investigate the two variants of weighted LRMF including standard weighted LRMF and weighted non-negative LRMF, and apply the Expectation-Maximization (EM) procedure to solve them. We also study different Singular Value Decomposition (SVD)-based initialization methods for these variants since the EM method is sensitive to the initial values. Experimental results in the Clio Online data set confirm that the proposed initialization methods lead to very fast convergence. Regarding the prediction accuracy, surprisingly, the advanced EM method is just slightly better than the baseline approach based on the global mean score and student/quiz bias. In order to understand the behaviour of the algorithm, we extract a dense subset of the data set and visualize its eigenvalue spectrum. The highly skewed eigenvalue spectrum of such subset explains our interesting findings. We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach achieves similar performance compared to other advanced methods. In future work, we will restrict the quiz data set, e.g. only including quizzes with a time limit, considering several quiz types. We expect that students will behave differently and the advanced EM methods might improve the prediction accuracy.

Original languageEnglish
Title of host publicationECEL17 - Proceedings of the 16th European Conference on e-Learning
EditorsPaula Peres, Anabela Mesquita
Number of pages9
PublisherAcademic Conferences and Publishing International
Publication dateOct 2017
Pages326-334
ISBN (Electronic)978-1911218593
Publication statusPublished - Oct 2017
Event16th European Conference on e-Learning - Porto, Portugal
Duration: 26 Oct 201727 Oct 2017
Conference number: 16

Conference

Conference16th European Conference on e-Learning
Nummer16
LandPortugal
ByPorto
Periode26/10/201727/10/2017

    Research areas

  • Collaborative filtering, Matrix factorization, Predicting student performance

ID: 188480687