Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets

Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Standard

Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets : experiences from predicting outcomes of Danish trauma patients. / Millarch, Andreas Skov; Bonde, Alexander; Bonde, Mikkel; Klein, Kiril Vadomovic; Folke, Fredrik; Rudolph, Søren Steemann; Sillesen, Martin.

I: Frontiers in Digital Health, Bind 5, 1249258, 2023.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Harvard

Millarch, AS, Bonde, A, Bonde, M, Klein, KV, Folke, F, Rudolph, SS & Sillesen, M 2023, 'Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients', Frontiers in Digital Health, bind 5, 1249258. https://doi.org/10.3389/fdgth.2023.1249258

APA

Millarch, A. S., Bonde, A., Bonde, M., Klein, K. V., Folke, F., Rudolph, S. S., & Sillesen, M. (2023). Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients. Frontiers in Digital Health, 5, [1249258]. https://doi.org/10.3389/fdgth.2023.1249258

Vancouver

Millarch AS, Bonde A, Bonde M, Klein KV, Folke F, Rudolph SS o.a. Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients. Frontiers in Digital Health. 2023;5. 1249258. https://doi.org/10.3389/fdgth.2023.1249258

Author

Millarch, Andreas Skov ; Bonde, Alexander ; Bonde, Mikkel ; Klein, Kiril Vadomovic ; Folke, Fredrik ; Rudolph, Søren Steemann ; Sillesen, Martin. / Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets : experiences from predicting outcomes of Danish trauma patients. I: Frontiers in Digital Health. 2023 ; Bind 5.

Bibtex

@article{8a2ff5ebffa6416b84803c00710b4f29,

title = "Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients",

abstract = "Introduction: Accurately predicting patient outcomes is crucial for improving healthcare delivery, but large-scale risk prediction models are often developed and tested on specific datasets where clinical parameters and outcomes may not fully reflect local clinical settings. Where this is the case, whether to opt for de-novo training of prediction models on local datasets, direct porting of externally trained models, or a transfer learning approach is not well studied, and constitutes the focus of this study. Using the clinical challenge of predicting mortality and hospital length of stay on a Danish trauma dataset, we hypothesized that a transfer learning approach of models trained on large external datasets would provide optimal prediction results compared to de-novo training on sparse but local datasets or directly porting externally trained models. Methods: Using an external dataset of trauma patients from the US Trauma Quality Improvement Program (TQIP) and a local dataset aggregated from the Danish Trauma Database (DTD) enriched with Electronic Health Record data, we tested a range of model-level approaches focused on predicting trauma mortality and hospital length of stay on DTD data. Modeling approaches included de-novo training of models on DTD data, direct porting of models trained on TQIP data to the DTD, and a transfer learning approach by training a model on TQIP data with subsequent transfer and retraining on DTD data. Furthermore, data-level approaches, including mixed dataset training and methods countering imbalanced outcomes (e.g., low mortality rates), were also tested. Results: Using a neural network trained on a mixed dataset consisting of a subset of TQIP and DTD, with class weighting and transfer learning (retraining on DTD), we achieved excellent results in predicting mortality, with a ROC-AUC of 0.988 and an F2-score of 0.866. The best-performing models for predicting long-term hospitalization were trained only on local data, achieving an ROC-AUC of 0.890 and an F1-score of 0.897, although only marginally better than alternative approaches. Conclusion: Our results suggest that when assessing the optimal modeling approach, it is important to have domain knowledge of how incidence rates and workflows compare between hospital systems and datasets where models are trained. Including data from other health-care systems is particularly beneficial when outcomes are suffering from class imbalance and low incidence. Scenarios where outcomes are not directly comparable are best addressed through either de-novo local training or a transfer learning approach.",

keywords = "artificial intelligence, healthcare system, length of stay, mortality, prediction model, surgery, transfer learning, trauma",

author = "Millarch, {Andreas Skov} and Alexander Bonde and Mikkel Bonde and Klein, {Kiril Vadomovic} and Fredrik Folke and Rudolph, {S{\o}ren Steemann} and Martin Sillesen",

note = "Publisher Copyright: 2023 Millarch, Bonde, Bonde, Klein, Folke, Rudolph and Sillesen.",

year = "2023",

doi = "10.3389/fdgth.2023.1249258",

language = "English",

volume = "5",

journal = "Frontiers in Digital Health",

issn = "2673-253X",

publisher = "Frontiers Media S.A.",

}

RIS

TY - JOUR

T1 - Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets

T2 - experiences from predicting outcomes of Danish trauma patients

AU - Millarch, Andreas Skov

AU - Bonde, Alexander

AU - Bonde, Mikkel

AU - Klein, Kiril Vadomovic

AU - Folke, Fredrik

AU - Rudolph, Søren Steemann

AU - Sillesen, Martin

N1 - Publisher Copyright: 2023 Millarch, Bonde, Bonde, Klein, Folke, Rudolph and Sillesen.

PY - 2023

Y1 - 2023

N2 - Introduction: Accurately predicting patient outcomes is crucial for improving healthcare delivery, but large-scale risk prediction models are often developed and tested on specific datasets where clinical parameters and outcomes may not fully reflect local clinical settings. Where this is the case, whether to opt for de-novo training of prediction models on local datasets, direct porting of externally trained models, or a transfer learning approach is not well studied, and constitutes the focus of this study. Using the clinical challenge of predicting mortality and hospital length of stay on a Danish trauma dataset, we hypothesized that a transfer learning approach of models trained on large external datasets would provide optimal prediction results compared to de-novo training on sparse but local datasets or directly porting externally trained models. Methods: Using an external dataset of trauma patients from the US Trauma Quality Improvement Program (TQIP) and a local dataset aggregated from the Danish Trauma Database (DTD) enriched with Electronic Health Record data, we tested a range of model-level approaches focused on predicting trauma mortality and hospital length of stay on DTD data. Modeling approaches included de-novo training of models on DTD data, direct porting of models trained on TQIP data to the DTD, and a transfer learning approach by training a model on TQIP data with subsequent transfer and retraining on DTD data. Furthermore, data-level approaches, including mixed dataset training and methods countering imbalanced outcomes (e.g., low mortality rates), were also tested. Results: Using a neural network trained on a mixed dataset consisting of a subset of TQIP and DTD, with class weighting and transfer learning (retraining on DTD), we achieved excellent results in predicting mortality, with a ROC-AUC of 0.988 and an F2-score of 0.866. The best-performing models for predicting long-term hospitalization were trained only on local data, achieving an ROC-AUC of 0.890 and an F1-score of 0.897, although only marginally better than alternative approaches. Conclusion: Our results suggest that when assessing the optimal modeling approach, it is important to have domain knowledge of how incidence rates and workflows compare between hospital systems and datasets where models are trained. Including data from other health-care systems is particularly beneficial when outcomes are suffering from class imbalance and low incidence. Scenarios where outcomes are not directly comparable are best addressed through either de-novo local training or a transfer learning approach.

AB - Introduction: Accurately predicting patient outcomes is crucial for improving healthcare delivery, but large-scale risk prediction models are often developed and tested on specific datasets where clinical parameters and outcomes may not fully reflect local clinical settings. Where this is the case, whether to opt for de-novo training of prediction models on local datasets, direct porting of externally trained models, or a transfer learning approach is not well studied, and constitutes the focus of this study. Using the clinical challenge of predicting mortality and hospital length of stay on a Danish trauma dataset, we hypothesized that a transfer learning approach of models trained on large external datasets would provide optimal prediction results compared to de-novo training on sparse but local datasets or directly porting externally trained models. Methods: Using an external dataset of trauma patients from the US Trauma Quality Improvement Program (TQIP) and a local dataset aggregated from the Danish Trauma Database (DTD) enriched with Electronic Health Record data, we tested a range of model-level approaches focused on predicting trauma mortality and hospital length of stay on DTD data. Modeling approaches included de-novo training of models on DTD data, direct porting of models trained on TQIP data to the DTD, and a transfer learning approach by training a model on TQIP data with subsequent transfer and retraining on DTD data. Furthermore, data-level approaches, including mixed dataset training and methods countering imbalanced outcomes (e.g., low mortality rates), were also tested. Results: Using a neural network trained on a mixed dataset consisting of a subset of TQIP and DTD, with class weighting and transfer learning (retraining on DTD), we achieved excellent results in predicting mortality, with a ROC-AUC of 0.988 and an F2-score of 0.866. The best-performing models for predicting long-term hospitalization were trained only on local data, achieving an ROC-AUC of 0.890 and an F1-score of 0.897, although only marginally better than alternative approaches. Conclusion: Our results suggest that when assessing the optimal modeling approach, it is important to have domain knowledge of how incidence rates and workflows compare between hospital systems and datasets where models are trained. Including data from other health-care systems is particularly beneficial when outcomes are suffering from class imbalance and low incidence. Scenarios where outcomes are not directly comparable are best addressed through either de-novo local training or a transfer learning approach.

KW - artificial intelligence

KW - healthcare system

KW - length of stay

KW - mortality

KW - prediction model

KW - surgery

KW - transfer learning

KW - trauma

U2 - 10.3389/fdgth.2023.1249258

DO - 10.3389/fdgth.2023.1249258

M3 - Journal article

C2 - 38026835

AN - SCOPUS:85177060379

VL - 5

JO - Frontiers in Digital Health

JF - Frontiers in Digital Health

SN - 2673-253X

M1 - 1249258

ER -

ID: 374649924

Datalogisk Institut