Abstract

Classifiers and imputation methods have played crucial parts in the field of big data analytics. Especially, when using data sets characterized by horizontal scattering, vertical scattering, level of spread, compound metric, imbalance ratio and missing ratio, how to combine those classifiers and imputation methods will lead to significantly different performance. Therefore, it is essential that the characteristics of data sets must be identified in advance to facilitate selection of the optimal combination of imputation methods and classifiers. However, this is a very costly process. The purpose of this paper is to propose a novel method of automatic, adaptive selection of the optimal combination of classifier and imputation method on the basis of features of a given data set. The proposed method turned out to successfully demonstrate the superiority in performance evaluations with multiple data sets. The decision makers in big data analytics could greatly benefit from the proposed method when it comes to dealing with data set in which the distribution of missing data varies in real time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.