Mixture Models for Spherical Data with Applications to Protein Bioinformatics

Research output: Chapter in Book/Report/Conference proceeding › Book chapter › Research › peer-review

Standard

Mixture Models for Spherical Data with Applications to Protein Bioinformatics. / Mardia, Kanti V.; Barber, Stuart; Burdett, Philippa M.; Kent, John T.; Hamelryck, Thomas.

Directional Statistics for Innovative Applications: A Bicentennial Tribute to Florence Nightingale. Springer, 2022. p. 15-32 (Forum for Interdisciplinary Mathematics).

Research output: Chapter in Book/Report/Conference proceeding › Book chapter › Research › peer-review

Harvard

Mardia, KV, Barber, S, Burdett, PM, Kent, JT & Hamelryck, T 2022, Mixture Models for Spherical Data with Applications to Protein Bioinformatics. in Directional Statistics for Innovative Applications: A Bicentennial Tribute to Florence Nightingale. Springer, Forum for Interdisciplinary Mathematics, pp. 15-32. https://doi.org/10.1007/978-981-19-1044-9_2

APA

Mardia, K. V., Barber, S., Burdett, P. M., Kent, J. T., & Hamelryck, T. (2022). Mixture Models for Spherical Data with Applications to Protein Bioinformatics. In Directional Statistics for Innovative Applications: A Bicentennial Tribute to Florence Nightingale (pp. 15-32). Springer. Forum for Interdisciplinary Mathematics https://doi.org/10.1007/978-981-19-1044-9_2

Vancouver

Mardia KV, Barber S, Burdett PM, Kent JT, Hamelryck T. Mixture Models for Spherical Data with Applications to Protein Bioinformatics. In Directional Statistics for Innovative Applications: A Bicentennial Tribute to Florence Nightingale. Springer. 2022. p. 15-32. (Forum for Interdisciplinary Mathematics). https://doi.org/10.1007/978-981-19-1044-9_2

Author

Mardia, Kanti V. ; Barber, Stuart ; Burdett, Philippa M. ; Kent, John T. ; Hamelryck, Thomas. / Mixture Models for Spherical Data with Applications to Protein Bioinformatics. Directional Statistics for Innovative Applications: A Bicentennial Tribute to Florence Nightingale. Springer, 2022. pp. 15-32 (Forum for Interdisciplinary Mathematics).

Bibtex

@inbook{0ca5adaff8804a7f94eab06ae976e622,

title = "Mixture Models for Spherical Data with Applications to Protein Bioinformatics",

abstract = "Finite mixture models are fitted to spherical data. Kent distributions are used for the components of the mixture because they allow considerable flexibility. Previous work on such mixtures has used an approximate maximum likelihood estimator for the parameters of a single component. However, the approximation causes problems when using the EM algorithm to estimate the parameters in a mixture model. Hence, the exact maximum likelihood estimator is used here for the individual components. This paper is motivated by a challenging prize problem in structural bioinformatics of how proteins fold. It is known that hydrogen bonds play a key role in the folding of a protein. We explore this hydrogen bond geometry using a data set describing bonds between two amino acids in proteins. An appropriate coordinate system to represent the hydrogen bond geometry is proposed, with each bond represented as a point on a sphere. We fit mixtures of Kent distributions to different subsets of the hydrogen bond data to gain insight into how the secondary structure elements bond together, since the distribution of hydrogen bonds depends on which secondary structure elements are involved.",

author = "Mardia, {Kanti V.} and Stuart Barber and Burdett, {Philippa M.} and Kent, {John T.} and Thomas Hamelryck",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.",

year = "2022",

doi = "10.1007/978-981-19-1044-9_2",

language = "English",

isbn = "978-981-19-1043-2",

series = "Forum for Interdisciplinary Mathematics",

publisher = "Springer",

pages = "15--32",

booktitle = "Directional Statistics for Innovative Applications",

address = "Switzerland",

}

RIS

TY - CHAP

T1 - Mixture Models for Spherical Data with Applications to Protein Bioinformatics

AU - Mardia, Kanti V.

AU - Barber, Stuart

AU - Burdett, Philippa M.

AU - Kent, John T.

AU - Hamelryck, Thomas

PY - 2022

Y1 - 2022

N2 - Finite mixture models are fitted to spherical data. Kent distributions are used for the components of the mixture because they allow considerable flexibility. Previous work on such mixtures has used an approximate maximum likelihood estimator for the parameters of a single component. However, the approximation causes problems when using the EM algorithm to estimate the parameters in a mixture model. Hence, the exact maximum likelihood estimator is used here for the individual components. This paper is motivated by a challenging prize problem in structural bioinformatics of how proteins fold. It is known that hydrogen bonds play a key role in the folding of a protein. We explore this hydrogen bond geometry using a data set describing bonds between two amino acids in proteins. An appropriate coordinate system to represent the hydrogen bond geometry is proposed, with each bond represented as a point on a sphere. We fit mixtures of Kent distributions to different subsets of the hydrogen bond data to gain insight into how the secondary structure elements bond together, since the distribution of hydrogen bonds depends on which secondary structure elements are involved.

AB - Finite mixture models are fitted to spherical data. Kent distributions are used for the components of the mixture because they allow considerable flexibility. Previous work on such mixtures has used an approximate maximum likelihood estimator for the parameters of a single component. However, the approximation causes problems when using the EM algorithm to estimate the parameters in a mixture model. Hence, the exact maximum likelihood estimator is used here for the individual components. This paper is motivated by a challenging prize problem in structural bioinformatics of how proteins fold. It is known that hydrogen bonds play a key role in the folding of a protein. We explore this hydrogen bond geometry using a data set describing bonds between two amino acids in proteins. An appropriate coordinate system to represent the hydrogen bond geometry is proposed, with each bond represented as a point on a sphere. We fit mixtures of Kent distributions to different subsets of the hydrogen bond data to gain insight into how the secondary structure elements bond together, since the distribution of hydrogen bonds depends on which secondary structure elements are involved.

UR - http://www.scopus.com/inward/record.url?scp=85132883072&partnerID=8YFLogxK

U2 - 10.1007/978-981-19-1044-9_2

DO - 10.1007/978-981-19-1044-9_2

M3 - Book chapter

AN - SCOPUS:85132883072

SN - 978-981-19-1043-2

T3 - Forum for Interdisciplinary Mathematics

SP - 15

EP - 32

BT - Directional Statistics for Innovative Applications

PB - Springer

ER -

ID: 314302529

Department of Computer Science