Continual learning in medical devices: FDA's action plan and beyond

Aaron S Kesselheim,Stefan Feuerriegel,Kerstin N Vokinger

doi:10.1016/s2589-7500(21)00076-5

Abstract

Artificial intelligence (AI) and machine learning (ML) software have the potential to improve patient care. An underlying algorithm can either be locked so that its function does not change, or adaptive, in which the AI and ML system performs continual learning. Continual learning, also known as lifelong learning, is a technique in which the decision logic of mathematical models is updated through new data while retaining previously learned knowledge.1Parisi GI Kemker R Part JL Kanan C Wermter S Continual lifelong learning with neural networks: a review.Neural Netw. 2019; 113: 54-71Crossref PubMed Scopus (652) Google Scholar, 2Lee CS Lee AY Clinical applications of continual learning machine learning.Lancet Digit Health. 2020; 2: e279-e281Summary Full Text Full Text PDF PubMed Scopus (43) Google Scholar By contrast, locked AI and ML systems prevent the ability to learn from post approval, real-world data, and thus cannot improve over time in the same way as adaptive systems. Continual learning has long been a part of computer science.1Parisi GI Kemker R Part JL Kanan C Wermter S Continual lifelong learning with neural networks: a review.Neural Netw. 2019; 113: 54-71Crossref PubMed Scopus (652) Google Scholar, 2Lee CS Lee AY Clinical applications of continual learning machine learning.Lancet Digit Health. 2020; 2: e279-e281Summary Full Text Full Text PDF PubMed Scopus (43) Google Scholar Yet, no medical device based on AI and ML continual learning has yet been approved by the US Food and Drug Administration (FDA).2Lee CS Lee AY Clinical applications of continual learning machine learning.Lancet Digit Health. 2020; 2: e279-e281Summary Full Text Full Text PDF PubMed Scopus (43) Google Scholar, 3Rivera SC Liu X Chan A-W Denniston AK Calvert MJ Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension.BMJ. 2020; 370m3210Crossref PubMed Scopus (53) Google Scholar It is likely that the FDA will have to make a decision about such a device in the near future.4FDAArtificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan.https://www.fda.gov/media/145022/downloadDate: January 2021Date accessed: April 16, 2021Google Scholar Continual learning has already been introduced in other sectors. For example, Tesla continuously updates its cars' autopilots on the basis of data feedback aggregated from its fleet of approximately 500 000 vehicles.5Towards data scienceTesla's deep learning at scale: using billions of miles to train neural networks.https://towardsdatascience.com/teslas-deep-learning-at-scale-7eed85b235d3Date: May 7, 2019Date accessed: April 16, 2021Google Scholar This example shows that continual learning has the potential to create more advanced AI and ML-based medical devices with updates on the basis of new data that allow performance improvement. These improvements could include personalisation or the removal of errors, which would lead to more accurate outcomes. In locked systems, algorithms are trained on a specific dataset. Such systems often perform well on similar data, but could perform poorly in scenarios that are rare in the training process.6Oren O Gersh BJ Bhatt DL Artificial intelligence in medical imaging: switching from radiographic pathological data to clinically meaningful endpoints.Lancet Digit Health. 2020; 2: e486-e488Summary Full Text Full Text PDF PubMed Scopus (46) Google Scholar By collecting post approval, real-world data, AI and ML-based medical devices could be updated through continual learning after authorisation, and could potentially support improved health outcomes. However, continual learning poses risks that need to be addressed. First, training on new data can introduce new errors. New data that are subject to errors, such as reporting errors (eg, when a wrong age or diagnosis is entered in electronic health records), could lead to inaccurate outcomes. Second, system performance might deteriorate if the newly integrated data are biased.7Kaushal A Altman R Langlotz C Geographic distribution of US cohorts used to train deep learning algorithms.JAMA. 2020; 324: 1212-1213Crossref PubMed Scopus (46) Google Scholar An example of this type of error, also referred to as a domain shift, is if an AI and ML system was developed with data from both white and black patients, but post approval, real-world data were collected disproportionately from white patients, which might eventually lead to a decrease in the accuracy of the outcome for black patients. Third, there is a risk that new information could interfere with what the model has already learned (also referred to as catastrophic forgetting).8Kirkpatrick J Pascanu R Rabinowitz N et al.Overcoming catastrophic forgetting in neural networks.Proc Natl Acad Sci USA. 2017; 114: 3521-3526Crossref PubMed Scopus (1403) Google Scholar Catastrophic forgetting might, in the worst case, overwrite the model's previous knowledge and lead to a deterioration in performance.1Parisi GI Kemker R Part JL Kanan C Wermter S Continual lifelong learning with neural networks: a review.Neural Netw. 2019; 113: 54-71Crossref PubMed Scopus (652) Google Scholar In May 2019, the FDA published a discussion paper, highlighting that one of the benefits of AI and ML-based software resides in its ability to learn from real-world feedback and in its capability to improve performance.9FDAProposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD).https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdfDate accessed: April 16, 2021Google Scholar In January 2021, the FDA issued an action plan to facilitate innovation through AI and ML-based medical software.4FDAArtificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan.https://www.fda.gov/media/145022/downloadDate: January 2021Date accessed: April 16, 2021Google Scholar One major step of the action plan is the regulation of the so-called predetermined change control plan in the authorisation process of continual learning medical devices. This plan shall include which aspects the manufacturer intends to change through learning (prespecifications) as well as the associated methodology being used to implement changes in a controlled manner that manages risks to patients (algorithm change protocol).4FDAArtificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan.https://www.fda.gov/media/145022/downloadDate: January 2021Date accessed: April 16, 2021Google Scholar The plan provides a clear pathway that would allow FDA to authorise continual learning in AI and ML-based medical devices. The inherent risks of continual learning systems, as well as the benefits, mean that it's important that the FDA takes a cautious approach to regulating continual learning systems. At this stage, the most pressing issue will be to determine the prerequisites in the predetermined change control plan regarding how and what aspects the manufacturer might change after authorisation of the AI and ML-based medical device. The FDA intends to specify this information in a draft guidance.4FDAArtificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan.https://www.fda.gov/media/145022/downloadDate: January 2021Date accessed: April 16, 2021Google Scholar A crucial principle is to ensure that the introduction of continual learning systems does not lead to a reduction in medical device performance. To reach this goal, the FDA and the manufacturer should determine, during the review process, how the device's performance could further be improved through learning, for example, by identifying and addressing specific errors. For training and updating of the AI and ML system, the prespecifications should determine which post approval data is relevant and should be selected. The newly integrated data should be accurate and unbiased. Otherwise, there is a risk that mistakes could be integrated in the AI and ML system. Flexibility might be appropriate in some cases. For example, in radiology, post approval images can be useful even when the correctness of medical diagnoses cannot be ensured or where medical diagnoses are missing altogether. Such image data can be used for pretraining, before the AI and ML system is finetuned with correct data, which might result in a better overall performance.10Esteva A Robicquet A Ramsundar B et al.A guide to deep learning in healthcare.Nat Med. 2019; 25: 24-29Crossref PubMed Scopus (951) Google Scholar Post approval monitoring is necessary, and should, in particular, ensure that the performance of the AI and ML system does not degrade over time as patient demographics and clinical practice changes. The algorithm change protocol determines the applied methodology to achieve the prespecifications. To ensure that continual learning maintains or improves performance and does not lead to unwanted changes in the inherent decision logic of the algorithm, by overwriting the existing inference or by introducing new errors, standardised testing routines are recommended.2Lee CS Lee AY Clinical applications of continual learning machine learning.Lancet Digit Health. 2020; 2: e279-e281Summary Full Text Full Text PDF PubMed Scopus (43) Google Scholar Testing routines ensure that the current performance level of the AI and ML system is maintained and that removal of any errors has been successful. Additionally, testing routines should check that, given the probabilistic nature of contemporary AI and ML systems, the performance on patient subgroups remains robust. Finally, to mitigate the risk of hacking attacks during updates, it is crucial that the connection (eg, WiFi or Ethernet) is encrypted, verified, and secure. The FDA's action plan on continual learning is intended to benefit patients. However, risks such as the introduction of new errors need to be addressed in the predetermined change control plan. All authors had final responsibility for the decision to submit for publication. KNV reports grants from Swiss National Science Foundatioin (SNSF) during the writing of the submitted work and grants from Swiss National Science Foundation (SNSF) and Swiss Cancer Research Foundation outside the submitted work. SF reports grants from Swiss National Science Foundation during the writing of the submitted work and grants from Swiss National Science Foundation outside the submitted work. ASK reports grants from Arnold Ventures during the writing of the submitted work and outside the submitted work. The funding sources had no role in the content of this Comment or the decision to publish it.

Full Text