Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review
Type
ArticleAuthor/s
Anderson, Anna WMarinovich, M Luke
Houssami, Nehmat
Lowry, Kathryn P
Elmore, Joann G
Buist, Diana S M
Hofvind, Solveig
Lee, Christoph I
Abstract
Purpose: The aim of this study was to describe the current state of science regarding independent external validation of artificial intelligence (AI) technologies for screening mammography.
Methods: A systematic review was performed across five databases (Embase, PubMed, IEEE ...
See morePurpose: The aim of this study was to describe the current state of science regarding independent external validation of artificial intelligence (AI) technologies for screening mammography. Methods: A systematic review was performed across five databases (Embase, PubMed, IEEE Explore, Engineer Village, and arXiv) through December 10, 2020. Studies that used screening examinations from real-world settings to externally validate AI algorithms for mammographic cancer detection were included. The main outcome was diagnostic accuracy, defined by area under the receiver operating characteristic curve (AUC). Performance was also compared between radiologists and either stand-alone AI or combined radiologist and AI interpretation. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. Results: After data extraction, 13 studies met the inclusion criteria (148,361 total patients). Most studies (77% [n = 10]) evaluated commercially available AI algorithms. Studies included retrospective reader studies (46% [n = 6]), retrospective simulation studies (38% [n = 5]), or both (15% [n = 2]). Across 5 studies comparing stand-alone AI with radiologists, 60% (n = 3) demonstrated improved accuracy with AI (AUC improvement range, 0.02-0.13). All 5 studies comparing combined radiologist and AI interpretation with radiologists alone demonstrated improved accuracy with AI (AUC improvement range, 0.028-0.115). Most studies had risk for bias or applicability concerns for patient selection (69% [n = 9]) and the reference standard (69% [n = 9]). Only two studies obtained ground-truth cancer outcomes through regional cancer registry linkage. Conclusions: To date, external validation efforts for AI screening mammographic technologies suggest small potential diagnostic accuracy improvements but have been retrospective in nature and suffer from risk for bias and applicability concerns.
See less
See morePurpose: The aim of this study was to describe the current state of science regarding independent external validation of artificial intelligence (AI) technologies for screening mammography. Methods: A systematic review was performed across five databases (Embase, PubMed, IEEE Explore, Engineer Village, and arXiv) through December 10, 2020. Studies that used screening examinations from real-world settings to externally validate AI algorithms for mammographic cancer detection were included. The main outcome was diagnostic accuracy, defined by area under the receiver operating characteristic curve (AUC). Performance was also compared between radiologists and either stand-alone AI or combined radiologist and AI interpretation. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. Results: After data extraction, 13 studies met the inclusion criteria (148,361 total patients). Most studies (77% [n = 10]) evaluated commercially available AI algorithms. Studies included retrospective reader studies (46% [n = 6]), retrospective simulation studies (38% [n = 5]), or both (15% [n = 2]). Across 5 studies comparing stand-alone AI with radiologists, 60% (n = 3) demonstrated improved accuracy with AI (AUC improvement range, 0.02-0.13). All 5 studies comparing combined radiologist and AI interpretation with radiologists alone demonstrated improved accuracy with AI (AUC improvement range, 0.028-0.115). Most studies had risk for bias or applicability concerns for patient selection (69% [n = 9]) and the reference standard (69% [n = 9]). Only two studies obtained ground-truth cancer outcomes through regional cancer registry linkage. Conclusions: To date, external validation efforts for AI screening mammographic technologies suggest small potential diagnostic accuracy improvements but have been retrospective in nature and suffer from risk for bias and applicability concerns.
See less
Date
2022Source title
Journal of the American College of RadiologyVolume
19Issue
2Publisher
ElsevierFunding information
ARC 1194410National Cancer Institute (R37 CA240403)
National Cancer Institute (P01CA154292)
National Breast Cancer Foundation Investigator Initiated Research Scheme grant (IIRS-20-011)
National Breast Cancer Foundation (grant #EC-21-001)
American Cancer Society Clinician Scientist Development Grant (CSDG-21-078-01-CPSH)
Licence
Creative Commons Attribution 4.0Faculty/School
Faculty of Medicine and Health, Sydney School of Public HealthShare