Algorithms for estimating the partition function of restricted Boltzmann machines

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Algorithms for estimating the partition function of restricted Boltzmann machines. / Krause, Oswin; Fischer, Asja; Igel, Christian.

In: Artificial Intelligence, Vol. 278, 103195, 01.2020.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Krause, O, Fischer, A & Igel, C 2020, 'Algorithms for estimating the partition function of restricted Boltzmann machines', Artificial Intelligence, vol. 278, 103195. https://doi.org/10.1016/j.artint.2019.103195

APA

Krause, O., Fischer, A., & Igel, C. (2020). Algorithms for estimating the partition function of restricted Boltzmann machines. Artificial Intelligence, 278, [103195]. https://doi.org/10.1016/j.artint.2019.103195

Vancouver

Krause O, Fischer A, Igel C. Algorithms for estimating the partition function of restricted Boltzmann machines. Artificial Intelligence. 2020 Jan;278. 103195. https://doi.org/10.1016/j.artint.2019.103195

Author

Krause, Oswin ; Fischer, Asja ; Igel, Christian. / Algorithms for estimating the partition function of restricted Boltzmann machines. In: Artificial Intelligence. 2020 ; Vol. 278.

Bibtex

@article{150f72f5a5bf471d9d3e9c11de089459,
title = "Algorithms for estimating the partition function of restricted Boltzmann machines",
abstract = "Accurate estimates of the normalization constants (partition functions) of energy-based probabilistic models (Markov random fields) are highly important, for example, for assessing the performance of models, monitoring training progress, and conducting likelihood ratio tests. Several algorithms for estimating the partition function (in relation to a reference distribution) have been introduced, including Annealed Importance Sampling (AIS) and Bennett's Acceptance Ratio method (BAR). However, their conceptual similarities and differences have not been worked out so far and systematic comparisons of their behavior in practice have been missing. We devise a unifying theoretical framework for these algorithms, which comprises existing variants and suggests new approaches. It is based on a generalized form of Crooks' equality linking the expectation over a distribution of samples generated by a transition operator to the expectation over the distribution induced by the reversed operator. The framework covers different ways of generating samples, such as parallel tempering and path sampling. An empirical comparison revealed the differences between the methods when estimating the partition function of restricted Boltzmann machines and Ising models. In our experiments, BAR using parallel tempering worked well with a small number of bridging distributions, while path sampling based AIS performed best when many bridging distributions were available. Because BAR gave the overall best results, we favor it over AIS. Furthermore, the experiments showed the importance of choosing a proper reference distribution.",
keywords = "Annealed importance sampling, Bennett's acceptance ratio, Bridge sampling, Crooks' equality, Ising model, Parallel tempering, Partition function estimation, Restricted Boltzmann machines",
author = "Oswin Krause and Asja Fischer and Christian Igel",
year = "2020",
month = jan,
doi = "10.1016/j.artint.2019.103195",
language = "English",
volume = "278",
journal = "Artificial Intelligence",
issn = "0004-3702",
publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - Algorithms for estimating the partition function of restricted Boltzmann machines

AU - Krause, Oswin

AU - Fischer, Asja

AU - Igel, Christian

PY - 2020/1

Y1 - 2020/1

N2 - Accurate estimates of the normalization constants (partition functions) of energy-based probabilistic models (Markov random fields) are highly important, for example, for assessing the performance of models, monitoring training progress, and conducting likelihood ratio tests. Several algorithms for estimating the partition function (in relation to a reference distribution) have been introduced, including Annealed Importance Sampling (AIS) and Bennett's Acceptance Ratio method (BAR). However, their conceptual similarities and differences have not been worked out so far and systematic comparisons of their behavior in practice have been missing. We devise a unifying theoretical framework for these algorithms, which comprises existing variants and suggests new approaches. It is based on a generalized form of Crooks' equality linking the expectation over a distribution of samples generated by a transition operator to the expectation over the distribution induced by the reversed operator. The framework covers different ways of generating samples, such as parallel tempering and path sampling. An empirical comparison revealed the differences between the methods when estimating the partition function of restricted Boltzmann machines and Ising models. In our experiments, BAR using parallel tempering worked well with a small number of bridging distributions, while path sampling based AIS performed best when many bridging distributions were available. Because BAR gave the overall best results, we favor it over AIS. Furthermore, the experiments showed the importance of choosing a proper reference distribution.

AB - Accurate estimates of the normalization constants (partition functions) of energy-based probabilistic models (Markov random fields) are highly important, for example, for assessing the performance of models, monitoring training progress, and conducting likelihood ratio tests. Several algorithms for estimating the partition function (in relation to a reference distribution) have been introduced, including Annealed Importance Sampling (AIS) and Bennett's Acceptance Ratio method (BAR). However, their conceptual similarities and differences have not been worked out so far and systematic comparisons of their behavior in practice have been missing. We devise a unifying theoretical framework for these algorithms, which comprises existing variants and suggests new approaches. It is based on a generalized form of Crooks' equality linking the expectation over a distribution of samples generated by a transition operator to the expectation over the distribution induced by the reversed operator. The framework covers different ways of generating samples, such as parallel tempering and path sampling. An empirical comparison revealed the differences between the methods when estimating the partition function of restricted Boltzmann machines and Ising models. In our experiments, BAR using parallel tempering worked well with a small number of bridging distributions, while path sampling based AIS performed best when many bridging distributions were available. Because BAR gave the overall best results, we favor it over AIS. Furthermore, the experiments showed the importance of choosing a proper reference distribution.

KW - Annealed importance sampling

KW - Bennett's acceptance ratio

KW - Bridge sampling

KW - Crooks' equality

KW - Ising model

KW - Parallel tempering

KW - Partition function estimation

KW - Restricted Boltzmann machines

UR - http://www.scopus.com/inward/record.url?scp=85074615132&partnerID=8YFLogxK

U2 - 10.1016/j.artint.2019.103195

DO - 10.1016/j.artint.2019.103195

M3 - Journal article

AN - SCOPUS:85074615132

VL - 278

JO - Artificial Intelligence

JF - Artificial Intelligence

SN - 0004-3702

M1 - 103195

ER -

ID: 231425133