The aging and multimorbid population and health personnel shortages pose a substantial burden on primary health care. While predictive machine learning (ML) algorithms have the potential to address these challenges, concerns include transparency and insufficient reporting of model validation and effectiveness of the implementation in the clinical workflow. To systematically identify predictive ML algorithms implemented in primary care from peer-reviewed literature and US Food and Drug Administration (FDA) and Conformité Européene (CE) registration databases and to ascertain the public availability of evidence, including peer-reviewed literature, gray literature, and technical reports across the artificial intelligence (AI) life cycle. PubMed, Embase, Web of Science, Cochrane Library, Emcare, Academic Search Premier, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI.org (Association for the Advancement of Artificial Intelligence), arXiv, Epistemonikos, PsycINFO, and Google Scholar were searched for studies published between January 2000 and July 2023, with search terms that were related to AI, primary care, and implementation. The search extended to CE-marked or FDA-approved predictive ML algorithms obtained from relevant registration databases. Three reviewers gathered subsequent evidence involving strategies such as product searches, exploration of references, manufacturer website visits, and direct inquiries to authors and product owners. The extent to which the evidence for each predictive ML algorithm aligned with the Dutch AI predictive algorithm (AIPA) guideline requirements was assessed per AI life cycle phase, producing evidence availability scores. The systematic search identified 43 predictive ML algorithms, of which 25 were commercially available and CE-marked or FDA-approved. The predictive ML algorithms spanned multiple clinical domains, but most (27 [63%]) focused on cardiovascular diseases and diabetes. Most (35 [81%]) were published within the past 5 years. The availability of evidence varied across different phases of the predictive ML algorithm life cycle, with evidence being reported the least for phase 1 (preparation) and phase 5 (impact assessment) (19% and 30%, respectively). Twelve (28%) predictive ML algorithms achieved approximately half of their maximum individual evidence availability score. Overall, predictive ML algorithms from peer-reviewed literature showed higher evidence availability compared with those from FDA-approved or CE-marked databases (45% vs 29%). The findings indicate an urgent need to improve the availability of evidence regarding the predictive ML algorithms' quality criteria. Adopting the Dutch AIPA guideline could facilitate transparent and consistent reporting of the quality criteria that could foster trust among end users and facilitating large-scale implementation.
Read full abstract