Identifying recurrent breast cancer patients in national health registries using machine learning

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Identifying recurrent breast cancer patients in national health registries using machine learning. / Lauritzen, Andreas David; Berg, Tobias; Jensen, Maj-Britt; Lillholm, Martin; Knoop, Ann.

In: Acta Oncologica, Vol. 62, No. 4, 2023, p. 350–357.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Lauritzen, AD, Berg, T, Jensen, M-B, Lillholm, M & Knoop, A 2023, 'Identifying recurrent breast cancer patients in national health registries using machine learning', Acta Oncologica, vol. 62, no. 4, pp. 350–357. https://doi.org/10.1080/0284186X.2023.2201687

APA

Lauritzen, A. D., Berg, T., Jensen, M-B., Lillholm, M., & Knoop, A. (2023). Identifying recurrent breast cancer patients in national health registries using machine learning. Acta Oncologica, 62(4), 350–357. https://doi.org/10.1080/0284186X.2023.2201687

Vancouver

Lauritzen AD, Berg T, Jensen M-B, Lillholm M, Knoop A. Identifying recurrent breast cancer patients in national health registries using machine learning. Acta Oncologica. 2023;62(4):350–357. https://doi.org/10.1080/0284186X.2023.2201687

Author

Lauritzen, Andreas David ; Berg, Tobias ; Jensen, Maj-Britt ; Lillholm, Martin ; Knoop, Ann. / Identifying recurrent breast cancer patients in national health registries using machine learning. In: Acta Oncologica. 2023 ; Vol. 62, No. 4. pp. 350–357.

Bibtex

@article{aeee802850fc48be976b55406282b67b,
title = "Identifying recurrent breast cancer patients in national health registries using machine learning",
abstract = "BackgroundMore than 4500 women are diagnosed with breast cancer each year in Denmark, however, despite adequate treatment 10-30% of patients will experience a recurrence. The Danish Breast Cancer Group (DBCG) stores information on breast cancer recurrence but to improve data completeness automated identification of patients with recurrence is needed.MethodsWe included patient data from the DBCG, the National Pathology Database, and the National Patient Registry for patients with an invasive breast cancer diagnosis after 1999. In total, relevant features of 79,483 patients with a definitive surgery were extracted. A machine learning (ML) model was trained, using a simplistic encoding scheme of features, on a development sample covering 5333 patients with known recurrence and three times as many non-recurrent women. The model was validated in a validation sample consisting of 1006 patients with unknown recurrence status.ResultsThe ML model identified patients with recurrence with AUC-ROC of 0.93 (95% CI: 0.93-0.94) in the development, and an AUC-ROC of 0.86 (95% CI: 0.83-0.88) in the validation sample.ConclusionAn off-the-shelf ML model, trained using the simplistic encoding scheme, could identify recurrence patients across multiple national registries. This approach might potentially enable researchers and clinicians to better and faster identify patients with recurrence and reduce manual patient data interpretation.",
keywords = "Breast cancer, machine learning, recurrence, DBCG",
author = "Lauritzen, {Andreas David} and Tobias Berg and Maj-Britt Jensen and Martin Lillholm and Ann Knoop",
year = "2023",
doi = "10.1080/0284186X.2023.2201687",
language = "English",
volume = "62",
pages = "350–357",
journal = "Acta Oncologica",
issn = "1100-1704",
publisher = "Taylor & Francis",
number = "4",

}

RIS

TY - JOUR

T1 - Identifying recurrent breast cancer patients in national health registries using machine learning

AU - Lauritzen, Andreas David

AU - Berg, Tobias

AU - Jensen, Maj-Britt

AU - Lillholm, Martin

AU - Knoop, Ann

PY - 2023

Y1 - 2023

N2 - BackgroundMore than 4500 women are diagnosed with breast cancer each year in Denmark, however, despite adequate treatment 10-30% of patients will experience a recurrence. The Danish Breast Cancer Group (DBCG) stores information on breast cancer recurrence but to improve data completeness automated identification of patients with recurrence is needed.MethodsWe included patient data from the DBCG, the National Pathology Database, and the National Patient Registry for patients with an invasive breast cancer diagnosis after 1999. In total, relevant features of 79,483 patients with a definitive surgery were extracted. A machine learning (ML) model was trained, using a simplistic encoding scheme of features, on a development sample covering 5333 patients with known recurrence and three times as many non-recurrent women. The model was validated in a validation sample consisting of 1006 patients with unknown recurrence status.ResultsThe ML model identified patients with recurrence with AUC-ROC of 0.93 (95% CI: 0.93-0.94) in the development, and an AUC-ROC of 0.86 (95% CI: 0.83-0.88) in the validation sample.ConclusionAn off-the-shelf ML model, trained using the simplistic encoding scheme, could identify recurrence patients across multiple national registries. This approach might potentially enable researchers and clinicians to better and faster identify patients with recurrence and reduce manual patient data interpretation.

AB - BackgroundMore than 4500 women are diagnosed with breast cancer each year in Denmark, however, despite adequate treatment 10-30% of patients will experience a recurrence. The Danish Breast Cancer Group (DBCG) stores information on breast cancer recurrence but to improve data completeness automated identification of patients with recurrence is needed.MethodsWe included patient data from the DBCG, the National Pathology Database, and the National Patient Registry for patients with an invasive breast cancer diagnosis after 1999. In total, relevant features of 79,483 patients with a definitive surgery were extracted. A machine learning (ML) model was trained, using a simplistic encoding scheme of features, on a development sample covering 5333 patients with known recurrence and three times as many non-recurrent women. The model was validated in a validation sample consisting of 1006 patients with unknown recurrence status.ResultsThe ML model identified patients with recurrence with AUC-ROC of 0.93 (95% CI: 0.93-0.94) in the development, and an AUC-ROC of 0.86 (95% CI: 0.83-0.88) in the validation sample.ConclusionAn off-the-shelf ML model, trained using the simplistic encoding scheme, could identify recurrence patients across multiple national registries. This approach might potentially enable researchers and clinicians to better and faster identify patients with recurrence and reduce manual patient data interpretation.

KW - Breast cancer

KW - machine learning

KW - recurrence

KW - DBCG

U2 - 10.1080/0284186X.2023.2201687

DO - 10.1080/0284186X.2023.2201687

M3 - Journal article

C2 - 37074036

VL - 62

SP - 350

EP - 357

JO - Acta Oncologica

JF - Acta Oncologica

SN - 1100-1704

IS - 4

ER -

ID: 347313666