A systematic analysis of regression models for protein engineering

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

A systematic analysis of regression models for protein engineering. / Michael, Richard; Kæstel-Hansen, Jacob; Groth, Peter Mørch; Bartels, Simon; Salomon, Jesper; Tian, Pengfei; Hatzakis, Nikos S.; Boomsma, Wouter.

In: PLOS Computational Biology, Vol. 20, No. 5 May, e1012061, 2024.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Michael, R, Kæstel-Hansen, J, Groth, PM, Bartels, S, Salomon, J, Tian, P, Hatzakis, NS & Boomsma, W 2024, 'A systematic analysis of regression models for protein engineering', PLOS Computational Biology, vol. 20, no. 5 May, e1012061. https://doi.org/10.1371/journal.pcbi.1012061

APA

Michael, R., Kæstel-Hansen, J., Groth, P. M., Bartels, S., Salomon, J., Tian, P., Hatzakis, N. S., & Boomsma, W. (2024). A systematic analysis of regression models for protein engineering. PLOS Computational Biology, 20(5 May), [e1012061]. https://doi.org/10.1371/journal.pcbi.1012061

Vancouver

Michael R, Kæstel-Hansen J, Groth PM, Bartels S, Salomon J, Tian P et al. A systematic analysis of regression models for protein engineering. PLOS Computational Biology. 2024;20(5 May). e1012061. https://doi.org/10.1371/journal.pcbi.1012061

Author

Michael, Richard ; Kæstel-Hansen, Jacob ; Groth, Peter Mørch ; Bartels, Simon ; Salomon, Jesper ; Tian, Pengfei ; Hatzakis, Nikos S. ; Boomsma, Wouter. / A systematic analysis of regression models for protein engineering. In: PLOS Computational Biology. 2024 ; Vol. 20, No. 5 May.

Bibtex

@article{dd7552ea979047d092c70a7b9dbb30df,

title = "A systematic analysis of regression models for protein engineering",

abstract = "To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.",

author = "Richard Michael and Jacob K{\ae}stel-Hansen and Groth, {Peter M{\o}rch} and Simon Bartels and Jesper Salomon and Pengfei Tian and Hatzakis, {Nikos S.} and Wouter Boomsma",

note = "Publisher Copyright: {\textcopyright} 2024 Michael et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",

year = "2024",

doi = "10.1371/journal.pcbi.1012061",

language = "English",

volume = "20",

journal = "P L o S Computational Biology (Online)",

issn = "1553-734X",

publisher = "Public Library of Science",

number = "5 May",

}

RIS

TY - JOUR

T1 - A systematic analysis of regression models for protein engineering

AU - Michael, Richard

AU - Kæstel-Hansen, Jacob

AU - Groth, Peter Mørch

AU - Bartels, Simon

AU - Salomon, Jesper

AU - Tian, Pengfei

AU - Hatzakis, Nikos S.

AU - Boomsma, Wouter

N1 - Publisher Copyright: © 2024 Michael et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2024

Y1 - 2024

N2 - To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.

AB - To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.

U2 - 10.1371/journal.pcbi.1012061

DO - 10.1371/journal.pcbi.1012061

M3 - Journal article

C2 - 38701099

AN - SCOPUS:85192312471

VL - 20

JO - P L o S Computational Biology (Online)

JF - P L o S Computational Biology (Online)

SN - 1553-734X

IS - 5 May

M1 - e1012061

ER -

ID: 392107551

Department of Computer Science