Abstract

Supply and demand increase in response to healthcare trends. Moreover, personal health records (PHRs) are being managed by individuals. Such records are collected using different avenues and vary considerably in terms of their type and scope depending on the particular circumstances. As a result, some data may be missing, which has a negative effect on the data analysis, and such data should, therefore, be replaced with appropriate values. In this study, a method for estimating missing data using a multi-modal autoencoder applied to the field of healthcare big data is proposed. The proposed method uses a stacked denoising autoencoder to estimate the missing data that occur during the data collection and processing stages. Autoencoders are neural networks that output value of x <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">^</sup> similar to an input value of x. In the present study, data from the Korean National Health Nutrition Examination Survey (KNHNES), conducted by the Korea Centers for Disease Control and Prevention (KCDC), are used. As representative healthcare data from South Korea, they contain a large number of parameters identical to those used in the PHRs. Based on this, models can be generated to estimate missing data occurring in PHRs. Furthermore, PHRs involve a multi-modality that allows the data to be collected from multiple sources for a single object. Therefore, the stacked denoising autoencoder applied is configured under a multi-modal setting. Through pre-processing, a set of data without missing value in KNHNES is designed. In the data set based learning, a label is set as original data, and an autoencoder input is set as noised input that additionally has as many random zero numbers as noise factor. In this way, the autoencoder learns in the way of making the zero-based noise value similar to the original label value. When the amount of missing data in a dataset reaches approximately 25%, the accuracy of the proposed method using a multi-modal stacked denoising autoencoder is 0.9217, which is higher than that achieved by other ordinary methods. For a single-modal denoising autoencoder, the accuracy is 0.932, with a slight difference of approximately 0.01, which falls within the allowable limits in data analysis. In terms of computational performance, a single-modal autoencoder has 10,384 parameters, which is 5,594 more than those used in a multi-modal stacked autoencoder. These parameters affect the speed of the model. Both models exhibit a significant difference in the number of parameters but demonstrate a relatively small difference in accuracy, suggesting that the proposed multi-modal stacked denoising autoencoder is advantageous over a single-modal model when used on a personal device. Moreover, a multi-modal model can save additional time when processing large amounts of data in locations such as hospitals and institutions.

Highlights

  • Healthcare big data involve complex relationships among the different parameters and are adaptable to changes in theThe associate editor coordinating the review of this manuscript and approving it for publication was Shuihua Wang .surroundings

  • A total of 80 parameters are selected from the preprocessed Korean National Health Nutrition Examination Survey (KNHNES) data

  • The results show that the accuracy of the proposed method is 0.9321 when a noise factor of 0.25 is applied

Read more

Summary

Introduction

Healthcare big data involve complex relationships among the different parameters and are adaptable to changes in theThe associate editor coordinating the review of this manuscript and approving it for publication was Shuihua Wang .surroundings. HANDLING OF MISSING DATA USING MULTI-MODAL STACKED DENOISING AUTOENCODER IN HEALTHCARE BIG DATA KNHNES [16] data can be classified into health, health examination, and nutritional survey data. A method for estimating missing data using a multi-modal stacked denoising autoencoder in the field of healthcare big data is proposed.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call