On predicting student performance using low-rank matrix factorization techniques

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

On predicting student performance using low-rank matrix factorization techniques. / Lorenzen, Stephan Sloth; Pham, Dang Ninh; Alstrup, Stephen.

ECEL17 - Proceedings of the 16th European Conference on e-Learning. ed. / Paula Peres; Anabela Mesquita. Academic Conferences and Publishing International, 2017. p. 326-334.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Lorenzen, SS, Pham, DN & Alstrup, S 2017, On predicting student performance using low-rank matrix factorization techniques. in P Peres & A Mesquita (eds), ECEL17 - Proceedings of the 16th European Conference on e-Learning. Academic Conferences and Publishing International, pp. 326-334, 16th European Conference on e-Learning, Porto, Portugal, 26/10/2017. <https://search.proquest.com/docview/1968935175/abstract/753AFBBC2F2F49B9PQ/1?accountid=13607>

APA

Lorenzen, S. S., Pham, D. N., & Alstrup, S. (2017). On predicting student performance using low-rank matrix factorization techniques. In P. Peres, & A. Mesquita (Eds.), ECEL17 - Proceedings of the 16th European Conference on e-Learning (pp. 326-334). Academic Conferences and Publishing International. https://search.proquest.com/docview/1968935175/abstract/753AFBBC2F2F49B9PQ/1?accountid=13607

Vancouver

Lorenzen SS, Pham DN, Alstrup S. On predicting student performance using low-rank matrix factorization techniques. In Peres P, Mesquita A, editors, ECEL17 - Proceedings of the 16th European Conference on e-Learning. Academic Conferences and Publishing International. 2017. p. 326-334

Author

Lorenzen, Stephan Sloth ; Pham, Dang Ninh ; Alstrup, Stephen. / On predicting student performance using low-rank matrix factorization techniques. ECEL17 - Proceedings of the 16th European Conference on e-Learning. editor / Paula Peres ; Anabela Mesquita. Academic Conferences and Publishing International, 2017. pp. 326-334

Bibtex

@inproceedings{eb5f636ac89d43898876a41d2a44fca4,

title = "On predicting student performance using low-rank matrix factorization techniques",

abstract = "Predicting the score of a student is one of the important problems in educational data mining. The scores given by an individual student reflect how a student understands and applies the knowledge conveyed in class. A reliable performance prediction enables teachers to identify weak students that require remedial support, generate adaptive hints, and improve the learning of students. This work focuses on predicting the score of students in the quiz system of the Clio Online learning platform, the largest Danish supplier of online learning materials, covering 90% of Danish elementary schools and hundred of thousands of students. In particular, we formalize our prediction task as the weighted low-rank matrix factorization (LRMF) problem, a very attractive problem in machine learning community due to its extensive applications in collaborative filtering. We investigate the two variants of weighted LRMF including standard weighted LRMF and weighted non-negative LRMF, and apply the Expectation-Maximization (EM) procedure to solve them. We also study different Singular Value Decomposition (SVD)-based initialization methods for these variants since the EM method is sensitive to the initial values. Experimental results in the Clio Online data set confirm that the proposed initialization methods lead to very fast convergence. Regarding the prediction accuracy, surprisingly, the advanced EM method is just slightly better than the baseline approach based on the global mean score and student/quiz bias. In order to understand the behaviour of the algorithm, we extract a dense subset of the data set and visualize its eigenvalue spectrum. The highly skewed eigenvalue spectrum of such subset explains our interesting findings. We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach achieves similar performance compared to other advanced methods. In future work, we will restrict the quiz data set, e.g. only including quizzes with a time limit, considering several quiz types. We expect that students will behave differently and the advanced EM methods might improve the prediction accuracy.",

keywords = "Collaborative filtering, Matrix factorization, Predicting student performance",

author = "Lorenzen, {Stephan Sloth} and Pham, {Dang Ninh} and Stephen Alstrup",

year = "2017",

month = oct,

language = "English",

pages = "326--334",

editor = "Paula Peres and Anabela Mesquita",

booktitle = "ECEL17 - Proceedings of the 16th European Conference on e-Learning",

publisher = "Academic Conferences and Publishing International",

note = "null ; Conference date: 26-10-2017 Through 27-10-2017",

}

RIS

TY - GEN

T1 - On predicting student performance using low-rank matrix factorization techniques

AU - Lorenzen, Stephan Sloth

AU - Pham, Dang Ninh

AU - Alstrup, Stephen

N1 - Conference code: 16

PY - 2017/10

Y1 - 2017/10

N2 - Predicting the score of a student is one of the important problems in educational data mining. The scores given by an individual student reflect how a student understands and applies the knowledge conveyed in class. A reliable performance prediction enables teachers to identify weak students that require remedial support, generate adaptive hints, and improve the learning of students. This work focuses on predicting the score of students in the quiz system of the Clio Online learning platform, the largest Danish supplier of online learning materials, covering 90% of Danish elementary schools and hundred of thousands of students. In particular, we formalize our prediction task as the weighted low-rank matrix factorization (LRMF) problem, a very attractive problem in machine learning community due to its extensive applications in collaborative filtering. We investigate the two variants of weighted LRMF including standard weighted LRMF and weighted non-negative LRMF, and apply the Expectation-Maximization (EM) procedure to solve them. We also study different Singular Value Decomposition (SVD)-based initialization methods for these variants since the EM method is sensitive to the initial values. Experimental results in the Clio Online data set confirm that the proposed initialization methods lead to very fast convergence. Regarding the prediction accuracy, surprisingly, the advanced EM method is just slightly better than the baseline approach based on the global mean score and student/quiz bias. In order to understand the behaviour of the algorithm, we extract a dense subset of the data set and visualize its eigenvalue spectrum. The highly skewed eigenvalue spectrum of such subset explains our interesting findings. We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach achieves similar performance compared to other advanced methods. In future work, we will restrict the quiz data set, e.g. only including quizzes with a time limit, considering several quiz types. We expect that students will behave differently and the advanced EM methods might improve the prediction accuracy.

AB - Predicting the score of a student is one of the important problems in educational data mining. The scores given by an individual student reflect how a student understands and applies the knowledge conveyed in class. A reliable performance prediction enables teachers to identify weak students that require remedial support, generate adaptive hints, and improve the learning of students. This work focuses on predicting the score of students in the quiz system of the Clio Online learning platform, the largest Danish supplier of online learning materials, covering 90% of Danish elementary schools and hundred of thousands of students. In particular, we formalize our prediction task as the weighted low-rank matrix factorization (LRMF) problem, a very attractive problem in machine learning community due to its extensive applications in collaborative filtering. We investigate the two variants of weighted LRMF including standard weighted LRMF and weighted non-negative LRMF, and apply the Expectation-Maximization (EM) procedure to solve them. We also study different Singular Value Decomposition (SVD)-based initialization methods for these variants since the EM method is sensitive to the initial values. Experimental results in the Clio Online data set confirm that the proposed initialization methods lead to very fast convergence. Regarding the prediction accuracy, surprisingly, the advanced EM method is just slightly better than the baseline approach based on the global mean score and student/quiz bias. In order to understand the behaviour of the algorithm, we extract a dense subset of the data set and visualize its eigenvalue spectrum. The highly skewed eigenvalue spectrum of such subset explains our interesting findings. We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach achieves similar performance compared to other advanced methods. In future work, we will restrict the quiz data set, e.g. only including quizzes with a time limit, considering several quiz types. We expect that students will behave differently and the advanced EM methods might improve the prediction accuracy.

KW - Collaborative filtering

KW - Matrix factorization

KW - Predicting student performance

UR - http://www.scopus.com/inward/record.url?scp=85037533014&partnerID=8YFLogxK

M3 - Article in proceedings

AN - SCOPUS:85037533014

SP - 326

EP - 334

BT - ECEL17 - Proceedings of the 16th European Conference on e-Learning

A2 - Peres, Paula

A2 - Mesquita, Anabela

PB - Academic Conferences and Publishing International

Y2 - 26 October 2017 through 27 October 2017

ER -

ID: 188480687

Department of Computer Science