Scalable Healthcare Assessment for Diabetic Patients Using Deep Learning on Multiple GPUs

Daniel Sierra-Sosa,Adel Elmaghraby,Begonya Garcia-Zapirain,Maider Urtaran-Laresgoiti,Ibon Oleagordia,Roberto Nuno-Solinis,Cristian Castillo

doi:10.1109/tii.2019.2919168

Abstract

The large-scale parallel computation that became available on the new generation of graphics processing units (GPUs) and on cloud-based services can be exploited for use in healthcare data analysis. Furthermore, computation workstations suited for deep learning are usually equipped with multiple GPUs allowing for workload distribution among multiple GPUs for larger datasets while exploiting parallelism in each GPU. In this paper, we utilize distributed and parallel computation techniques to efficiently analyze healthcare data using deep learning techniques. We demonstrate the scalability and computational benefits of this approach with a case study of longitudinal assessment of approximately 150 000 type 2 diabetic patients. Type 2 diabetes mellitus (T2DM) is the fourth case of mortality worldwide with rising prevalence. T2DM leads to adverse events such as acute myocardial infarction, major amputations, and avoidable hospitalizations. This paper aims to establish a relation between laboratory and medical assessment variables with the occurrence of the aforementioned adverse events and its prediction using machine learning techniques. We use a raw database provided by Basque Health Service, Spain, to conduct this study. This database contains 150 156 patients diagnosed with T2DM, from whom 321 laboratory and medical assessment variables recorded over four years are available. Predictions of adverse events on T2DM patients using both classical machine learning and deep learning techniques were performed and evaluated using accuracy, precision, recall and F1-score as metrics. The best performance for the prediction of acute myocardial infarction is obtained by linear discriminant analysis (LDA) and support vector machines (SVM) both balanced and weight models with an accuracy of 97%; hospital admission for avoidable causes best performance is obtained by LDA balanced and SVMs balanced both with an accuracy of 92%. For the prediction of the incidence of at least one adverse event, the model with the best performance is the recurrent neural network trained with a balanced dataset with an accuracy of 94.6%. The ability to perform and compare these experiments was possible through the use of a workstation with multi-GPUs. This setup allows for scalability to larger datasets. Such models are also cloud ready and can be deployed on similar architectures hosted on AWS for even larger datasets.

Full Text