Abstract

The cornerstone of artificial intelligence (AI) is its inherent ability to learn from past experience, thus mimicking the human mind and deviating from traditional approaches with predefined rules. The (machine) learning process involves examples shown to the algorithm for training purposes and its performance is then assessed with new and unseen examples. Specifically, during the training phase, the dependencies of the algorithm are defined and these dependencies are used to predict the outcomes for the new data. Data mining is the process encompassing the lower level programming steps in an attempt to discover interesting and useful patterns from large volumes of complex data. Figure 1 depicts schematically the relationship among AI, machine learning and data mining within the overall task of data exploration and exploitation. A key distinction between machine learning and human learning is that humans are able to extrapolate knowledge, therefore, conjecturing complex associations and rules from relatively small and incomplete amounts of data, whereas machine learning algorithms, by definition, achieve better performance as the amount of data increases. Moreover, machine learning algorithms are able to process and assimilate data from large numbers of patients, far greater than the ones a single physician can manage and comprehend in an entire career. Even though AI and its branches have been around for several decades, it is not until recently that they have grown with extraordinary pace in the healthcare domain. The reason for this is essentially twofold: (i) the accumulation of increasing volumes and complexity of healthcare data via electronic medical records and (ii) the fact that AI algorithms have gradually become faster, more effective and, to a certain extent, are becoming explainable. The first medical specialties that benefited from the adoption of AI techniques were imaging and oncology, followed by neurology and cardiology. The number of publications related to respiratory medicine that utilize AI techniques is increasing exponentially in the last few years1 with chronic obstructive pulmonary disease (COPD) accounting for a large amount of these studies. COPD constitutes an ideal target for AI for several reasons: it is a complex progressive disease encompassing obscure gene–environment interactions; its manifestations fluctuate greatly in time, scale and dimension; and the affected patients undergo numerous tests producing clinical, imaging, genomic, metabolomic and proteomic data, with spirometries and imaging (especially computed tomography [CT] scans) being the most integral. It should also be highlighted that as COPD is a chronic condition, the majority of data are time-course, which adds an extra layer of complexity that AI has been proven to counter effectively, by identifying data trajectories. In the literature, AI applications in COPD research span across all aspects of the disease, that is, from diagnosing patients with COPD and subsequently classifying them into meaningful categories to orchestrating their treatment and overall disease management, eventually capturing the progression and prognosis of the disease. In terms of COPD diagnosis, where spirometric evaluation is essential, an AI algorithm was recently proposed2 aiming to recognize the pattern of pulmonary function tests (PFTs) yielding perfect accuracy, compared to 75% achieved by the pulmonologists, and assign potential diagnosis for a patient encounter given the PFT and certain clinical information, where the accuracy of the AI software and the pulmonologists was 82% and 45%, respectively. The algorithm was trained from a sizable and diverse data set of approximately 1500 patients, and was validated in a prospectively collected cohort of 50 well-annotated patients. This study highlights two significant factors of the AI process: the quantity and the quality of data are crucial. As these algorithms are essentially data-driven, their performance improves as more training data become sequentially available, thus assimilating the collective knowledge of large and diverse patient sets. Moreover, machine learning algorithms are highly dependent on the quality of input data and the respective annotation, conforming to the cliche ‘garbage in, garbage out’. Another critical issue in COPD is the paradox of the underdiagnosis and overdiagnosis of the disease.3 In a recent study, deep learning was applied on low-dose CTs for the automated detection of COPD, resulting in an overall area under the curve of 0.89, as calculated on an independent test set of more than 2000 CT scans.4 Validation on external sets contributes to the unbiased assessment of the algorithms' performance and provides a more reliable estimate of the actual performance in the clinical setting. The analysis of imaging data, and specifically CT scans, is perhaps the field that AI is expected to shine in COPD research, given the volume and complexity of the data. To this end, the contribution of deep learning models is invaluable, as they are able to capture the dependencies of big data (in terms of samples, parameters or both) better than traditional machine learning algorithms. In terms of COPD subtypes characterization, COPDGene is to date the most comprehensive and extensive database, encompassing 10-year longitudinal clinical, imaging and genomic data from patients diagnosed with COPD of variable severity5; its primary purpose is to relate COPD phenotypes with underlying molecular and genetic patterns. COPDGene has also highlighted the importance of acquiring and analysing time-course data in order to capture the progressive and chronic nature of the disease. Regarding the prognosis of COPD, a machine learning mortality prediction algorithm was recently proposed, utilizing clinical, spirometric and imaging data from 3900 patients from the COPDGene and ECLIPSE cohorts.6 Top predictors of mortality were 6-min walk distance, forced expiratory volume in 1 s % predicted age and pulmonary artery-to-aorta ratio. The machine learning model outperformed previously existing composite indexes (BODE [Body mass index, airflow Obstruction, Dyspnea, and Exercise capacity], BODE modifications and ADO [Age, Dyspnoea, airflow Obstruction]) in mortality prediction, using fewer predictors, suggesting a potential implication for future practice. Overall, COPD and respiratory research are lately ‘flooded’ with AI techniques, the majority of which utilize deep learning techniques. An important advantage of AI in COPD and medicine overall is the fact that the algorithm works in the same manner every time invoked and is not affected or biased by the urgent pressure of the clinical setting that is often imposed on the medical doctors. A trained AI algorithm can provide an answer for a specific task under consideration in a limited time frame, often within a fraction of a second. Nevertheless, in all similar techniques and for the imminent future, the final decision and responsibility remain in the hands of the treating physicians, supported by the computing power of the AI algorithm, ultimately targeting the benefit of the patient. Dr Konstantinos Kostikas reports grants, personal fees and non-financial support from AstraZeneca, Boehringer Ingelheim, Chiesi, ELPEN, GSK, Menarini and Novartis; grants from NuvoAir; and personal fees from Sanofi, outside the submitted work. Dr Konstantinos Kostikas was an employee and shareholder of Novartis Pharma AG until 31 October 2018.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call