On the initialization of long short-term memory networks

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Mehdipour Ghazi, Mostafa
Nielsen, Mads
Akshay Pai
Marc Modat
M. Jorge Cardoso
Sébastien Ourselin
Lauge Sørensen

Weight initialization is important for faster convergence and stability of deep neural networks training. In this paper, a robust initialization method is developed to address the training instability in long short-term memory (LSTM) networks. It is based on a normalized random initialization of the network weights that aims at preserving the variance of the network input and output in the same range. The method is applied to standard LSTMs for univariate time series regression and to LSTMs robust to missing values for multivariate disease progression modeling. The results show that in all cases, the proposed initialization method outperforms the state-of-the-art initialization techniques in terms of training convergence and generalization performance of the obtained solution.

Original language	English
Title of host publication	Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
Editors	Tom Gedeon, Kok Wai Wong, Minho Lee
Number of pages	12
Publisher	Springer VS
Publication date	2019
Pages	275-286
ISBN (Print)	9783030367077
DOIs	https://doi.org/10.1007/978-3-030-36708-4_23
Publication status	Published - 2019
Event	26th International Conference on Neural Information Processing, ICONIP 2019 - Sydney, Australia Duration: 12 Dec 2019 → 15 Dec 2019

Conference

Conference	26th International Conference on Neural Information Processing, ICONIP 2019
Land	Australia
By	Sydney
Periode	12/12/2019 → 15/12/2019

Series	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11953 LNCS
ISSN	0302-9743

Research areas

Deep neural networks, Disease progression modeling, Initialization, Long short-term memory, Time series regression

Department of Computer Science