On predicting student performance using low-rank matrix factorization techniques
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
On predicting student performance using low-rank matrix factorization techniques. / Lorenzen, Stephan Sloth; Pham, Dang Ninh; Alstrup, Stephen.
ECEL17 - Proceedings of the 16th European Conference on e-Learning. ed. / Paula Peres; Anabela Mesquita. Academic Conferences and Publishing International, 2017. p. 326-334.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - On predicting student performance using low-rank matrix factorization techniques
AU - Lorenzen, Stephan Sloth
AU - Pham, Dang Ninh
AU - Alstrup, Stephen
N1 - Conference code: 16
PY - 2017/10
Y1 - 2017/10
N2 - Predicting the score of a student is one of the important problems in educational data mining. The scores given by an individual student reflect how a student understands and applies the knowledge conveyed in class. A reliable performance prediction enables teachers to identify weak students that require remedial support, generate adaptive hints, and improve the learning of students. This work focuses on predicting the score of students in the quiz system of the Clio Online learning platform, the largest Danish supplier of online learning materials, covering 90% of Danish elementary schools and hundred of thousands of students. In particular, we formalize our prediction task as the weighted low-rank matrix factorization (LRMF) problem, a very attractive problem in machine learning community due to its extensive applications in collaborative filtering. We investigate the two variants of weighted LRMF including standard weighted LRMF and weighted non-negative LRMF, and apply the Expectation-Maximization (EM) procedure to solve them. We also study different Singular Value Decomposition (SVD)-based initialization methods for these variants since the EM method is sensitive to the initial values. Experimental results in the Clio Online data set confirm that the proposed initialization methods lead to very fast convergence. Regarding the prediction accuracy, surprisingly, the advanced EM method is just slightly better than the baseline approach based on the global mean score and student/quiz bias. In order to understand the behaviour of the algorithm, we extract a dense subset of the data set and visualize its eigenvalue spectrum. The highly skewed eigenvalue spectrum of such subset explains our interesting findings. We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach achieves similar performance compared to other advanced methods. In future work, we will restrict the quiz data set, e.g. only including quizzes with a time limit, considering several quiz types. We expect that students will behave differently and the advanced EM methods might improve the prediction accuracy.
AB - Predicting the score of a student is one of the important problems in educational data mining. The scores given by an individual student reflect how a student understands and applies the knowledge conveyed in class. A reliable performance prediction enables teachers to identify weak students that require remedial support, generate adaptive hints, and improve the learning of students. This work focuses on predicting the score of students in the quiz system of the Clio Online learning platform, the largest Danish supplier of online learning materials, covering 90% of Danish elementary schools and hundred of thousands of students. In particular, we formalize our prediction task as the weighted low-rank matrix factorization (LRMF) problem, a very attractive problem in machine learning community due to its extensive applications in collaborative filtering. We investigate the two variants of weighted LRMF including standard weighted LRMF and weighted non-negative LRMF, and apply the Expectation-Maximization (EM) procedure to solve them. We also study different Singular Value Decomposition (SVD)-based initialization methods for these variants since the EM method is sensitive to the initial values. Experimental results in the Clio Online data set confirm that the proposed initialization methods lead to very fast convergence. Regarding the prediction accuracy, surprisingly, the advanced EM method is just slightly better than the baseline approach based on the global mean score and student/quiz bias. In order to understand the behaviour of the algorithm, we extract a dense subset of the data set and visualize its eigenvalue spectrum. The highly skewed eigenvalue spectrum of such subset explains our interesting findings. We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach achieves similar performance compared to other advanced methods. In future work, we will restrict the quiz data set, e.g. only including quizzes with a time limit, considering several quiz types. We expect that students will behave differently and the advanced EM methods might improve the prediction accuracy.
KW - Collaborative filtering
KW - Matrix factorization
KW - Predicting student performance
UR - http://www.scopus.com/inward/record.url?scp=85037533014&partnerID=8YFLogxK
M3 - Article in proceedings
AN - SCOPUS:85037533014
SP - 326
EP - 334
BT - ECEL17 - Proceedings of the 16th European Conference on e-Learning
A2 - Peres, Paula
A2 - Mesquita, Anabela
PB - Academic Conferences and Publishing International
Y2 - 26 October 2017 through 27 October 2017
ER -
ID: 188480687