Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms. / Kühl, Johanne; Elhakim, Mohammad Talal; Stougaard, Sarah Wordenskjold; Rasmussen, Benjamin Schnack Brandt; Nielsen, Mads; Gerke, Oke; Larsen, Lisbet Brønsro; Graumann, Ole.

I: European Radiology, 2024.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Kühl, J, Elhakim, MT, Stougaard, SW, Rasmussen, BSB, Nielsen, M, Gerke, O, Larsen, LB & Graumann, O 2024, 'Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms', European Radiology. https://doi.org/10.1007/s00330-023-10423-7

APA

Kühl, J., Elhakim, M. T., Stougaard, S. W., Rasmussen, B. S. B., Nielsen, M., Gerke, O., Larsen, L. B., & Graumann, O. (2024). Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms. European Radiology. https://doi.org/10.1007/s00330-023-10423-7

Vancouver

Kühl J, Elhakim MT, Stougaard SW, Rasmussen BSB, Nielsen M, Gerke O o.a. Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms. European Radiology. 2024. https://doi.org/10.1007/s00330-023-10423-7

Author

Kühl, Johanne ; Elhakim, Mohammad Talal ; Stougaard, Sarah Wordenskjold ; Rasmussen, Benjamin Schnack Brandt ; Nielsen, Mads ; Gerke, Oke ; Larsen, Lisbet Brønsro ; Graumann, Ole. / Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms. I: European Radiology. 2024.

Bibtex

@article{e0f60544530642f7b8439c5dd46f63ec,
title = "Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms",
abstract = "Objectives: To validate an AI system for standalone breast cancer detection on an entire screening population in comparison to first-reading breast radiologists. Materials and methods: All mammography screenings performed between August 4, 2014, and August 15, 2018, in the Region of Southern Denmark with follow-up within 24 months were eligible. Screenings were assessed as normal or abnormal by breast radiologists through double reading with arbitration. For an AI decision of normal or abnormal, two AI-score cut-off points were applied by matching at mean sensitivity (AIsens) and specificity (AIspec) of first readers. Accuracy measures were sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and recall rate (RR). Results: The sample included 249,402 screenings (149,495 women) and 2033 breast cancers (72.6% screen-detected cancers, 27.4% interval cancers). AIsens had lower specificity (97.5% vs 97.7%; p < 0.0001) and PPV (17.5% vs 18.7%; p = 0.01) and a higher RR (3.0% vs 2.8%; p < 0.0001) than first readers. AIspec was comparable to first readers in terms of all accuracy measures. Both AIsens and AIspec detected significantly fewer screen-detected cancers (1166 (AIsens), 1156 (AIspec) vs 1252; p < 0.0001) but found more interval cancers compared to first readers (126 (AIsens), 117 (AIspec) vs 39; p < 0.0001) with varying types of cancers detected across multiple subgroups. Conclusion: Standalone AI can detect breast cancer at an accuracy level equivalent to the standard of first readers when the AI threshold point was matched at first reader specificity. However, AI and first readers detected a different composition of cancers. Clinical relevance statement: Replacing first readers with AI with an appropriate cut-off score could be feasible. AI-detected cancers not detected by radiologists suggest a potential increase in the number of cancers detected if AI is implemented to support double reading within screening, although the clinicopathological characteristics of detected cancers would not change significantly. Key Points: • Standalone AI cancer detection was compared to first readers in a double-read mammography screening population. • Standalone AI matched at first reader specificity showed no statistically significant difference in overall accuracy but detected different cancers. • With an appropriate threshold, AI-integrated screening can increase the number of detected cancers with similar clinicopathological characteristics.",
keywords = "Artificial intelligence, Breast cancer, Mammography, Screening",
author = "Johanne K{\"u}hl and Elhakim, {Mohammad Talal} and Stougaard, {Sarah Wordenskjold} and Rasmussen, {Benjamin Schnack Brandt} and Mads Nielsen and Oke Gerke and Larsen, {Lisbet Br{\o}nsro} and Ole Graumann",
note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",
year = "2024",
doi = "10.1007/s00330-023-10423-7",
language = "English",
journal = "European Radiology",
issn = "0938-7994",
publisher = "Springer",

}

RIS

TY - JOUR

T1 - Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms

AU - Kühl, Johanne

AU - Elhakim, Mohammad Talal

AU - Stougaard, Sarah Wordenskjold

AU - Rasmussen, Benjamin Schnack Brandt

AU - Nielsen, Mads

AU - Gerke, Oke

AU - Larsen, Lisbet Brønsro

AU - Graumann, Ole

N1 - Publisher Copyright: © 2023, The Author(s).

PY - 2024

Y1 - 2024

N2 - Objectives: To validate an AI system for standalone breast cancer detection on an entire screening population in comparison to first-reading breast radiologists. Materials and methods: All mammography screenings performed between August 4, 2014, and August 15, 2018, in the Region of Southern Denmark with follow-up within 24 months were eligible. Screenings were assessed as normal or abnormal by breast radiologists through double reading with arbitration. For an AI decision of normal or abnormal, two AI-score cut-off points were applied by matching at mean sensitivity (AIsens) and specificity (AIspec) of first readers. Accuracy measures were sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and recall rate (RR). Results: The sample included 249,402 screenings (149,495 women) and 2033 breast cancers (72.6% screen-detected cancers, 27.4% interval cancers). AIsens had lower specificity (97.5% vs 97.7%; p < 0.0001) and PPV (17.5% vs 18.7%; p = 0.01) and a higher RR (3.0% vs 2.8%; p < 0.0001) than first readers. AIspec was comparable to first readers in terms of all accuracy measures. Both AIsens and AIspec detected significantly fewer screen-detected cancers (1166 (AIsens), 1156 (AIspec) vs 1252; p < 0.0001) but found more interval cancers compared to first readers (126 (AIsens), 117 (AIspec) vs 39; p < 0.0001) with varying types of cancers detected across multiple subgroups. Conclusion: Standalone AI can detect breast cancer at an accuracy level equivalent to the standard of first readers when the AI threshold point was matched at first reader specificity. However, AI and first readers detected a different composition of cancers. Clinical relevance statement: Replacing first readers with AI with an appropriate cut-off score could be feasible. AI-detected cancers not detected by radiologists suggest a potential increase in the number of cancers detected if AI is implemented to support double reading within screening, although the clinicopathological characteristics of detected cancers would not change significantly. Key Points: • Standalone AI cancer detection was compared to first readers in a double-read mammography screening population. • Standalone AI matched at first reader specificity showed no statistically significant difference in overall accuracy but detected different cancers. • With an appropriate threshold, AI-integrated screening can increase the number of detected cancers with similar clinicopathological characteristics.

AB - Objectives: To validate an AI system for standalone breast cancer detection on an entire screening population in comparison to first-reading breast radiologists. Materials and methods: All mammography screenings performed between August 4, 2014, and August 15, 2018, in the Region of Southern Denmark with follow-up within 24 months were eligible. Screenings were assessed as normal or abnormal by breast radiologists through double reading with arbitration. For an AI decision of normal or abnormal, two AI-score cut-off points were applied by matching at mean sensitivity (AIsens) and specificity (AIspec) of first readers. Accuracy measures were sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and recall rate (RR). Results: The sample included 249,402 screenings (149,495 women) and 2033 breast cancers (72.6% screen-detected cancers, 27.4% interval cancers). AIsens had lower specificity (97.5% vs 97.7%; p < 0.0001) and PPV (17.5% vs 18.7%; p = 0.01) and a higher RR (3.0% vs 2.8%; p < 0.0001) than first readers. AIspec was comparable to first readers in terms of all accuracy measures. Both AIsens and AIspec detected significantly fewer screen-detected cancers (1166 (AIsens), 1156 (AIspec) vs 1252; p < 0.0001) but found more interval cancers compared to first readers (126 (AIsens), 117 (AIspec) vs 39; p < 0.0001) with varying types of cancers detected across multiple subgroups. Conclusion: Standalone AI can detect breast cancer at an accuracy level equivalent to the standard of first readers when the AI threshold point was matched at first reader specificity. However, AI and first readers detected a different composition of cancers. Clinical relevance statement: Replacing first readers with AI with an appropriate cut-off score could be feasible. AI-detected cancers not detected by radiologists suggest a potential increase in the number of cancers detected if AI is implemented to support double reading within screening, although the clinicopathological characteristics of detected cancers would not change significantly. Key Points: • Standalone AI cancer detection was compared to first readers in a double-read mammography screening population. • Standalone AI matched at first reader specificity showed no statistically significant difference in overall accuracy but detected different cancers. • With an appropriate threshold, AI-integrated screening can increase the number of detected cancers with similar clinicopathological characteristics.

KW - Artificial intelligence

KW - Breast cancer

KW - Mammography

KW - Screening

UR - http://www.scopus.com/inward/record.url?scp=85176093211&partnerID=8YFLogxK

U2 - 10.1007/s00330-023-10423-7

DO - 10.1007/s00330-023-10423-7

M3 - Journal article

C2 - 37938386

AN - SCOPUS:85176093211

JO - European Radiology

JF - European Radiology

SN - 0938-7994

ER -

ID: 374121671