Abstract

Electronic health records (EHRs) are the records containing the patients' clinic information. The EHRs have been widely used in disease diagnosis and therapy due to the numerous and valuable medical information in them. However, the missing data problem of EHRs hinders the usage. Replacing the missing data with mean values is an approach of data imputation. But, that method weakens the feature importance. In this study, we use the expectation-maximization (EM) algorithm to impute the missing data in EHRs. Some machine learning models, including artificial neural network, logistic regression, support vector machine, and random forests are used to evaluate the effectiveness of data imputation. The experimental results show that the prediction accuracies of cancers by using those models on the EHRs imputed by EM algorithm are higher than those by mean values, which indicates the EM algorithm is able to provide accurate estimations in data imputation of EHRs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call