On predicting student performance using low-rank matrix factorization techniques

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Predicting the score of a student is one of the important problems in educational data mining. The scores given by an individual student reflect how a student understands and applies the knowledge conveyed in class. A reliable performance prediction enables teachers to identify weak students that require remedial support, generate adaptive hints, and improve the learning of students. This work focuses on predicting the score of students in the quiz system of the Clio Online learning platform, the largest Danish supplier of online learning materials, covering 90% of Danish elementary schools and hundred of thousands of students. In particular, we formalize our prediction task as the weighted low-rank matrix factorization (LRMF) problem, a very attractive problem in machine learning community due to its extensive applications in collaborative filtering. We investigate the two variants of weighted LRMF including standard weighted LRMF and weighted non-negative LRMF, and apply the Expectation-Maximization (EM) procedure to solve them. We also study different Singular Value Decomposition (SVD)-based initialization methods for these variants since the EM method is sensitive to the initial values. Experimental results in the Clio Online data set confirm that the proposed initialization methods lead to very fast convergence. Regarding the prediction accuracy, surprisingly, the advanced EM method is just slightly better than the baseline approach based on the global mean score and student/quiz bias. In order to understand the behaviour of the algorithm, we extract a dense subset of the data set and visualize its eigenvalue spectrum. The highly skewed eigenvalue spectrum of such subset explains our interesting findings. We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach achieves similar performance compared to other advanced methods. In future work, we will restrict the quiz data set, e.g. only including quizzes with a time limit, considering several quiz types. We expect that students will behave differently and the advanced EM methods might improve the prediction accuracy.

OriginalsprogEngelsk
TitelECEL17 - Proceedings of the 16th European Conference on e-Learning
RedaktørerPaula Peres, Anabela Mesquita
Antal sider9
ForlagAcademic Conferences and Publishing International
Publikationsdatookt. 2017
Sider326-334
ISBN (Elektronisk)978-1911218593
StatusUdgivet - okt. 2017
Begivenhed16th European Conference on e-Learning - Porto, Portugal
Varighed: 26 okt. 201727 okt. 2017
Konferencens nummer: 16

Konference

Konference16th European Conference on e-Learning
Nummer16
LandPortugal
ByPorto
Periode26/10/201727/10/2017

ID: 188480687