Abstract

There is great interest in artificial intelligence (AI) in health care. As with many innovations in medicine, there is a fine line between potential benefits and harms with the use of AI. This is especially important for preventive services such as cancer screening, where the trade-off between benefits and harms is delicate and involves millions of individuals. This Commentary clarifies clinical challenges of AI in cancer screening programs and explores solutions that can enable its optimal implementation. Most AI tools in cancer screening aim at increasing detection of small abnormalities to increase screening sensitivity, so-called computer-aided detection (CADe). Early work with AI has been focusing on mammography screening for breast cancer. The US Food and Drug Administration (FDA) approved the first CADe system for mammography in 1998, and Medicare and Medicaid have reimbursed its use since 2002. In the 2000s, randomized controlled trials (RCTs) established the feasibility of CADe tools assisting radiologists in finding suspicious areas on mammograms for breast cancer screening.1Gilbert F.J. Astley S.M. Gillan M.G. et al.Single reading with computer-aided detection for screening mammography.N Engl J Med. 2008; 359: 1675-1684Crossref PubMed Scopus (217) Google Scholar Registry studies also showed that CADe contributed to detection of more cancers at an earlier stage (ie, ductal carcinoma in situ).2Fenton J.J. Lee C.I. Xing G. et al.Computer-aided detection in mammography: downstream effect on diagnostic testing, ductal carcinoma in situ treatment, and costs.JAMA Intern Med. 2014; 174: 2032-2034Crossref PubMed Scopus (9) Google Scholar,3Fenton J.J. Xing G. Elmore J.G. et al.Short-term outcomes of screening mammography using computer-aided detection: a population-based study of medicare enrollees.Ann Intern Med. 2013; 158: 580-587Crossref PubMed Scopus (49) Google Scholar CADe tools are now used for more than 80% of screening mammograms in the US.4Lehman C.D. Wellman R.D. Buist D.S. et al.Diagnostic accuracy of digital screening mammography with and without computer-aided detection.JAMA Intern Med. 2015; 175: 1828-1837Crossref PubMed Scopus (340) Google Scholar More recently, CADe tools have been developed for colonoscopy screening for colorectal cancer. These tools indicate possible polyps with red boxes on the endoscopist’s screen in real time during the procedure. Regulators in the US, Europe, and Japan have recently approved the first CADe devices for colonoscopy, and they are being implemented in clinical practice. Five RCTs have shown that adenoma detection rates (ADRs) increase by about 50% with the use of CADe and its related technologies5Barua I. Vinsard D.G. Jodal H.C. et al.Artificial intelligence for polyp detection during colonoscopy: a systematic review and meta-analysis.Endoscopy. 2021; 53: 277-284Crossref PubMed Scopus (73) Google Scholar,6Hassan C. Spadaccini M. Iannone A. et al.Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: a systematic review and meta-analysis.Gastrointest Endosc. 2021; 93: 77-85 e6Abstract Full Text Full Text PDF PubMed Scopus (144) Google Scholar; Wang et al conducted both nonblinded and blinded RCTs in China,7Wang P. Liu X. Berzin T.M. et al.Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study.Lancet Gastroenterol Hepatol. 2020; Abstract Full Text Full Text PDF Scopus (183) Google Scholar,8Wang P. Berzin T.M. Glissen Brown J.R. et al.Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study.Gut. 2019; 68: 1813-1819Crossref PubMed Scopus (386) Google Scholar and Repici et al proved the efficacy of CADe in an RCT involving multiple European centers.9Repici A. Badalamenti M. Maselli R. et al.Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial.Gastroenterology. 2020; 159: 512-520.e7Abstract Full Text Full Text PDF PubMed Scopus (212) Google Scholar Gong et al10Gong D. Wu L. Zhang J. et al.Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study.Lancet Gastroenterol Hepatol. 2020; 5: 352-361Abstract Full Text Full Text PDF PubMed Scopus (157) Google Scholar and Su et al11Su J.R. Li Z. Shao X.J. et al.Impact of a real-time automatic quality control system on colorectal polyp and adenoma detection: a prospective randomized controlled study (with videos).Gastrointest Endosc. 2020; 91: 415-424.e4Abstract Full Text Full Text PDF PubMed Scopus (142) Google Scholar evaluated the AI-based quality assurance systems that alerted endoscopists to blind spots and excess speed during colonoscopy withdrawal. All of these RCTs showed significantly increased ADRs in the groups allocated to the use of AI. This may indeed contribute to decreased incidence of advanced-stage cancer and thus reduced cancer mortality.12Kaminski M.F. Wieszczy P. Rupinski M. et al.Increased rate of adenoma detection associates with reduced risk of colorectal cancer and death.Gastroenterology. 2017; 153: 98-105Abstract Full Text Full Text PDF PubMed Scopus (267) Google Scholar However, there are always trade-offs of increased detection. Overdiagnosis in cancer screening is the detection of a cancer or premalignant tumor that would not cause symptoms or death during an individuals’ lifetime without screening.13Welch H.G. Black W.C. Overdiagnosis in cancer.J Natl Cancer Inst. 2010; 102: 605-613Crossref PubMed Scopus (1199) Google Scholar Harms of overdiagnosis include psychological anxiety, overtreatment of clinically irrelevant lesions, and unnecessary surveillance, all of which result in considerable patient burden and social-economic costs. With increased AI detection of small lesions, the risk of overdiagnosis increases (Figure 1). Recent analyses have indicated that AI-assisted mammography screening may lead to more false positive findings and increase costs associated with subsequent testing without improvements in overall cancer detection (Figure 1).2Fenton J.J. Lee C.I. Xing G. et al.Computer-aided detection in mammography: downstream effect on diagnostic testing, ductal carcinoma in situ treatment, and costs.JAMA Intern Med. 2014; 174: 2032-2034Crossref PubMed Scopus (9) Google Scholar,4Lehman C.D. Wellman R.D. Buist D.S. et al.Diagnostic accuracy of digital screening mammography with and without computer-aided detection.JAMA Intern Med. 2015; 175: 1828-1837Crossref PubMed Scopus (340) Google Scholar The financial waste due to these AI-driven overdiagnoses was reported to be 400 million dollars per year in the US.6Hassan C. Spadaccini M. Iannone A. et al.Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: a systematic review and meta-analysis.Gastrointest Endosc. 2021; 93: 77-85 e6Abstract Full Text Full Text PDF PubMed Scopus (144) Google Scholar Furthermore, given that mammography itself has a very marginal effect on cancer mortality reduction,14Adami H.O. Kalager M. Valdimarsdottir U. et al.Time to abandon early detection cancer screening.Eur J Clin Invest. 2019; 49e13062Crossref PubMed Scopus (19) Google Scholar the addition of AI is also expected to provide more marginal (perhaps no) benefit on survival. Lack of RCTs with long-term follow-up led to this discouraging situation in AI-aided mammography screening.15Guerriero C. Gillan M.G. Cairns J. et al.Is computer aided detection (CAD) cost effective in screening mammography? A model based on the CADET II study.BMC Health Serv Res. 2011; 11: 11Crossref PubMed Scopus (25) Google Scholar Application of AI in colonoscopy screening harbors a similar risk. Colonoscopy has a role as a so-called preventive screening test, in which detection and removal of adenomas is expected to reduce cancer incidence and mortality. However, lifetime prevalence of colorectal cancer is approximately 5%, but greater than 50% for adenomas.16Kalager M. Wieszczy P. Lansdorp-Vogelaar I. et al.Overdiagnosis in colorectal cancer screening: time to acknowledge a blind spot.Gastroenterology. 2018; 155: 592-595Abstract Full Text Full Text PDF PubMed Scopus (29) Google Scholar This means that most adenomas which are removed during colonoscopy screening would never have progressed to cancer. Thus, most individuals who undergo adenoma removal during colonoscopy do not experience benefits, but are prone to harms and burdens such as higher cost for colonoscopy, more frequent colonoscopy surveillances, and treatment complications. This trend toward overdiagnosis is likely to be accelerated with more detection of small adenomas with the use of CADe. The contribution of these small adenomas to future cancer risk is debated.17Sekiguchi M. Otake Y. Kakugawa Y. et al.Incidence of advanced colorectal neoplasia in individuals with untreated diminutive colorectal adenomas diagnosed by magnifying image-enhanced endoscopy.Am J Gastroenterol. 2019; 114: 964-973Crossref PubMed Scopus (20) Google Scholar AI does not increase detection of large adenomas or of cancer.5Barua I. Vinsard D.G. Jodal H.C. et al.Artificial intelligence for polyp detection during colonoscopy: a systematic review and meta-analysis.Endoscopy. 2021; 53: 277-284Crossref PubMed Scopus (73) Google Scholar,6Hassan C. Spadaccini M. Iannone A. et al.Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: a systematic review and meta-analysis.Gastrointest Endosc. 2021; 93: 77-85 e6Abstract Full Text Full Text PDF PubMed Scopus (144) Google Scholar Therefore, AI-assisted colonoscopy may increase detection of clinically relevant polyps and thus prevent cancer, but will also increase overdiagnosis of polyps with little or no malignant potential. Owing to the inherent pitfalls related to overdiagnosis caused by CADe tools, a future priority to utilize the potential for AI in cancer screening lies in development of targeted detection of lesions with clinical consequences. AI tools designed to classify polyps into premalignant vs nonpremalignant, called computer-aided diagnosis (CADx), are under development for colonoscopy and aim at real-time classification of small polyps to determine which polyps need to be removed.18Mori Y. Kudo S.E. Misawa M. et al.Real-time use of artificial intelligence in identification of diminutive polyps during colonoscopy: a prospective study.Ann Intern Med. 2018; 169: 357-366Crossref PubMed Scopus (259) Google Scholar,19Byrne M.F. Chapados N. Soudan F. et al.Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model.Gut. 2019; 68: 94-100Crossref PubMed Scopus (339) Google Scholar These tools may play a future role in mitigating CADe-associated overdiagnosis in cancer screening services and optimizing their cancer prevention effect.20Mori Y. Kudo S.E. East J.E. et al.Cost savings in colonoscopy with artificial intelligence-aided polyp diagnosis: an add-on analysis of a clinical trial (with video).Gastrointest Endosc. 2020; 92: 905-911.e1Abstract Full Text Full Text PDF PubMed Scopus (62) Google Scholar Many screening programs have sophisticated quality improvement programs to properly train and certify endoscopists in colonoscopy screening, reducing unwanted variation between doctors who perform screening procedures. However, there is still considerable variation between doctors in reliably identifying pathology at screening procedures.16Kalager M. Wieszczy P. Lansdorp-Vogelaar I. et al.Overdiagnosis in colorectal cancer screening: time to acknowledge a blind spot.Gastroenterology. 2018; 155: 592-595Abstract Full Text Full Text PDF PubMed Scopus (29) Google Scholar,21Elmore J.G. Jackson S.L. Abraham L. et al.Variability in interpretive performance at screening mammography and radiologists’ characteristics associated with accuracy.Radiology. 2009; 253: 641-651Crossref PubMed Scopus (186) Google Scholar In colonoscopy screening, the largest additional gain from increased detection rates may be for examinations performed by physicians who are at the lower end of detection.12Kaminski M.F. Wieszczy P. Rupinski M. et al.Increased rate of adenoma detection associates with reduced risk of colorectal cancer and death.Gastroenterology. 2017; 153: 98-105Abstract Full Text Full Text PDF PubMed Scopus (267) Google Scholar Inversely, a recent study from the Polish colorectal cancer screening program showed that the risk of colorectal cancer after screening by physicians with high detection rates is already very close to zero for more than 15 years without an AI aid.22Pilonis N.D. Bugajski M. Wieszczy P. et al.Long-term colorectal cancer incidence and mortality after a single negative screening colonoscopy.Ann Intern Med. 2020; 173: 81-91Crossref PubMed Scopus (49) Google Scholar For high performers, AI-aided detection may provide little additional benefit but may increase the risk for overdiagnosis.16Kalager M. Wieszczy P. Lansdorp-Vogelaar I. et al.Overdiagnosis in colorectal cancer screening: time to acknowledge a blind spot.Gastroenterology. 2018; 155: 592-595Abstract Full Text Full Text PDF PubMed Scopus (29) Google Scholar,22Pilonis N.D. Bugajski M. Wieszczy P. et al.Long-term colorectal cancer incidence and mortality after a single negative screening colonoscopy.Ann Intern Med. 2020; 173: 81-91Crossref PubMed Scopus (49) Google Scholar On the other hand, AI may play its most important role in improving the quality of colonoscopy by suboptimal performers and thus reduce unwanted variation among physicians. Indeed, a recent study on CADe during colonoscopy showed less variation in polyp detection between colonoscopsists when an AI aid was introduced.9Repici A. Badalamenti M. Maselli R. et al.Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial.Gastroenterology. 2020; 159: 512-520.e7Abstract Full Text Full Text PDF PubMed Scopus (212) Google Scholar Today, regulatory approval for AI-based CAD products is mainly based on estimates for detection. In the future, the process of regulatory approval of AI devices needs to reflect the short- and long-term benefits and the downstream harms and burdens. Most AI tools are designated as class II medical devices by the FDA, which entail low to middle risk without requirement for rigorous clinical evaluation. AI tools, however, are powerful devices with equally large prospects of benefits, harms, and burdens. They can profoundly change clinical practice for many millions of individuals in screening programs around the world, for better through increased cancer prevention or for worse through increased false positives, more overdiagnosis, and thus increased patient harms and burden. These issues should be resolved when payers consider reimbursement for AI in colonoscopy in the near future. Absolute benefits, harms, and burden of AI in cancer screening cannot be assessed without rigorously designed RCTs with long-term follow-up. However, testing of new tools with long-term outcomes in cancer screening is often regarded as too expensive and time-consuming. To tackle this issue, we propose integration of the trials into ongoing public screening platforms which provide a perfect ground to evaluate the long-term effectiveness of promising AI technologies (so-called learning screening programs23Kalager M. Bretthauer M. Improving cancer screening programs.Science. 2020; 367: 143-144Crossref PubMed Scopus (15) Google Scholar or randomized health services studies24Kaminski M.F. Kraszewska E. Rupinski M. et al.Design of the Polish Colonoscopy Screening Program: a randomized health services study.Endoscopy. 2015; 47: 1144-1150Crossref PubMed Scopus (34) Google Scholar). Screening programs are established in many countries, and most programs have ideal infrastructures for testing, refinement, and implementation and run highly regulated environments with sophisticated quality programs and performance indicators. Therefore, screening programs represent a reliable and inexpensive testing ground for new AI tools. The concept of “learning screening programs” facilitates quality testing of human-computer interaction in continuous circles of evidence generation.23Kalager M. Bretthauer M. Improving cancer screening programs.Science. 2020; 367: 143-144Crossref PubMed Scopus (15) Google Scholar In learning screening programs, individuals can be randomized individually or in clusters by screening facility or provider to be screened with the standard screening test or the screening test with the most promising AI technologies. Stepped-wedge randomization and platform trials are included in design toolboxes of learning screening programs.23Kalager M. Bretthauer M. Improving cancer screening programs.Science. 2020; 367: 143-144Crossref PubMed Scopus (15) Google Scholar In time (eg, 1 year), the new AI tools would be implemented in the whole screening program for all newly enrolled individuals. This enables integration into cancer screening programs with rigorous testing and rapid implementation at the same time. In learning screening programs, individuals can be reliably followed for short-term outcomes such as detection rates and treatment harms and long-term outcomes such as cancer incidence and mortality. If long-term assessment indicates incremental net harms of AI (eg, through increased overdiagnosis) that is not accompanied by significant reduction in advanced-stage cancer and cancer mortality compared with the individuals who are examined with the standard test, the AI tool can be de-implemented to avoid harm or overdiagnosis for future patients.25Prasad V. Ioannidis J.P. Evidence-based de-implementation for contradicted, unproven, and aspiring healthcare practices.Implement Sci. 2014; 9: 1Crossref PubMed Scopus (222) Google Scholar Although such evaluation takes time, it will not create major additional costs because it is embedded in screening programs and uses existing databases, quality programs, and infrastructure. If randomization is not an option, scientific evaluable implementation can still be facilitated with the use of stepwise implementation across centers, providers, or geographic areas and modern difference-in-difference analyses and causal inference methodology. Cancer screening entails a fine line between benefits and harms. AI technologies have great potential to move this line toward more benefits, but also harbor a risk of causing considerable harms. Clinical trials embedded in learning screening programs will be the cornerstone of the process to disentangle this critical question (Table 1). If we do not test, we will never find out.Table 1Next Steps to Incorporate Artificial Intelligence Into Colonoscopy Screening1. Clinical trials embedded in cancer screening programs to disentangle long-term benefits, harms and burden, and cost-effectiveness.2. Considering the balance between benefits and harms and burden in regulatory approval and reimbursement policies.3. Adoption of computer-aided diagnosis (CADx) to minimize the increased cost that results from the use of computer-aided detection (CADe). Open table in a new tab

Highlights

  • Five randomized controlled trials (RCTs) have shown that adenoma detection rates (ADRs) increase by about 50% with the use of computer-aided detection (CADe) and its related technologies[5,6]; Wang et al conducted both nonblinded and blinded RCTs in China,[7,8] and Repici et al proved the efficacy of CADe in an RCT involving multiple European centers.[9]

  • Lifetime prevalence of colorectal cancer is approximately 5%, but greater than 50% for adenomas.[16]. This means that most adenomas which are removed during colonoscopy screening would never have progressed to cancer

  • Most individuals who undergo adenoma removal during colonoscopy do not experience benefits, but are prone to harms and burdens such as higher cost for colonoscopy, more frequent colonoscopy surveillances, and treatment complications. This trend toward overdiagnosis is likely to be accelerated with more detection of small adenomas with the use of CADe

Read more

Summary

Hopes and Hypes for Artificial Intelligence in Colorectal Cancer Screening

There is great interest in artificial intelligence (AI) in health care. As with many innovations in medicine, there is a fine line between potential benefits and harms with the use of AI. This is especially important for preventive services such as cancer screening, where the trade-off between benefits and harms is delicate and involves millions of individuals. This Commentary clarifies clinical challenges of AI in cancer screening programs and explores solutions that can enable its optimal implementation

Detection of Disease
Arrows with pa ern fill indicate expected outcomes
Regulatory Approval
Learning Screening Programs
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call