Abstract

Medical data usually have missing values; hence, imputation methods have become an important issue. In previous studies, many imputation methods based on variable data had a multivariate normal distribution, such as expectation-maximization and regression-based imputation. These assumptions may lead to deviations in the results, which sometimes create a bottleneck. In addition, directly deleting instances with missing values may have several problems, such as losing important data, producing invalid research samples, and leading to research deviations. Therefore, this study proposed a safe-region imputation method for handling medical data with missing values; we also built a medical prediction model and compared the removed missing values with imputation methods in terms of the generated rules, accuracy, and AUC. First, this study used the kNN imputation, multiple imputation, and the proposed imputation to impute the missing data and then applied four attribute selection methods to select the important attributes. Then, we used the decision tree (C4.5), random forest, REP tree, and LMT classifier to generate the rules, accuracy, and AUC for comparison. Because there were four datasets with imbalanced classes (asymmetric classes), the AUC was an important criterion. In the experiment, we collected four open medical datasets from UCI and one international stroke trial dataset. The results show that the proposed safe-region imputation is better than the listing imputation methods and after imputing offers better results than directly deleting instances with missing values in the number of rules, accuracy, and AUC. These results will provide a reference for medical stakeholders.

Highlights

  • Due to the advancement of medical knowledge and technology, the life-span of human beings has significantly improved, and health has become increasingly important for everyone

  • We ranked the ordering of the selected attributes based on the weight values for each attribute selection method and summed the four rankings to re-rank the ordering, as shown in the last column of Table 4

  • We determined whether the weight value of each attribute selection method was significant

Read more

Summary

Introduction

Due to the advancement of medical knowledge and technology, the life-span of human beings has significantly improved, and health has become increasingly important for everyone. The aging population led to 56.9 million deaths worldwide in 2016, among which the top ten diseases caused more than 54% of deaths [1]. Ischemic heart disease and stroke have been the leading causes of death worldwide for the past 15 years, killing 15.2 million in 2016. The rapid development of information technology produces increasingly more data, so determining how to effectively use these data and turn them into valuable information is crucial. The effective use of a medical database allows one to find the death factors and rules of patients based on past data and information. When similar symptoms (factors) occur at the time of medical treatment, medical staff can utilize these factors (rules) to make the best medical decisions for the patient immediately

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call