Abstract

Objective: Investigating the effects of missing data and the methods to overcome problems in statistical models caused by missingness is a significant research topic due to the complex nature of the data, which includes missing observations. The different statistical approaches used in the case of the missing data are complete case analysis and missing data imputation. It is necessary to evaluate missing data mechanisms and patterns to handle missing data issues. However, understanding the missing data mechanism is not easy in relatively large data sets. Recently, deep learning algorithms have been widely used for classification, regression, or clustering tasks in large data sets due to computational advances. The objective of this study is to present the effect of missing data mechanisms on the performance of the deep learning algorithm for binary classification problems. Material and Method: To achieve the aim of this study, an extensive simulation study was conducted using Virtual Machine on Microsoft Azure by considering the missing proportion, the correlation structure, and the mechanism of the missing in the large data set. For different missing data mechanisms, the performance of deep learning with list-wise deletion and imputation compared to the original data set was investigated. Results: It is observed that while the proportion and the mechanism of the missing affect slightly the performance of the deep learning, the correlation level of data affects relatively. Conclusion: Although slight differences were obtained from the area under the curve values, deep learning algorithms can overcome the problem caused by missingness in large data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call