Abstract

Missing data occurs in all research, especially in medical studies. Missing data is the situation in which a part of research data has not been reported. This will result in the incompatibility of the sample and the population and misguided conclusions. Missing data is usual in research, and the extent of it will determine how misinterpreted the conclusions will be. All methods of parameter estimation and prediction models are based on the assumption that the data are complete. Extensive missing data will result in false predictions and increased bias. In the present study, a novel method has been proposed for the imputation of medical missing data. The method determines what algorithm is suitable for the imputation of missing data. To do so, a multiobjective particle swarm optimization algorithm was used. The algorithm imputes the missing data in a way that if a prediction model is applied to the data, both specificity and sensitivity will be optimized. Our proposed model was evaluated using real data of gastric cancer and acute T-cell leukemia (ATLL). First, the model was then used to impute the missing data. Then, the missing data were imputed using deletion, average, expectation maximization, MICE, and missForest methods. Finally, the prediction model was applied for both imputed datasets. The accuracy of the prediction model for the first and the second imputation methods was 0.5 and 16.5, respectively. The novel imputation method was more accurate than similar algorithms like expectation maximization and MICE.

Highlights

  • Disease treatment is closely linked to medical observation and data interpretation

  • If the number of these records is less than 50%, the records with missing data are imputed using multivariate imputation by chained equations (MICE) to obtain at least 50% data. en, the algorithms are processed for analysis

  • Low-quality data result in the low quality of conclusions. us, preprocessing and data cleaning are applied to improve the quality of the data

Read more

Summary

Introduction

Disease treatment is closely linked to medical observation and data interpretation. Medical data collection and interpretation are the foundation of medical and health care since the data greatly affect decision-making. All health care measures are linked to medical data collection, interpretation, and application [1]. Missing data are the values that have not been recorded for a variable and are a challenge in preprocessing of data in medical sciences. Missing medical data occurs for different reasons and results in the poor quality of extracted information by data mining [2]. Data deletion will eliminate all the information of the records and results in a low quality of interpretation. Several methods have been proposed to solve this problem. These methods will reduce the quality of medical data since they introduce bias. These methods will reduce the quality of medical data since they introduce bias. e majority of the models often improve only accuracy, specificity, or sensitivity and cannot improve all of them simultaneously

Background
Problem-Solving Using a Multiobjective Particle Swarm Optimization Algorithm
Findings
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.