Abstract

When preparing large databases, obtaining quality data for analysis without any missing values is almost impossible in many cases. Integration of raw data from multiple heterogeneous sources often results in some values missing leading to loss of valuable information. Even though many methods have been introduced by researchers, only less effort has been spent on handling missing values in heterogeneous attributes (both discrete and continuous) under Missing At Random pattern, the common scenario where missing values have dependency on covariates in the dataset. Also, only few techniques are capable of dealing with missing values in large databases and this demands immediate attention of researchers. This paper addresses both these problems by introducing a single technique called Bayesian Ant colony Optimization (BACO) which combines the searching capability of Ant Colony Optimization with probabilistic nature of Bayesian principles. The algorithm is designed in such a way that missing values in both discrete and continuous attributes in large datasets are efficiently imputed. BACO is implemented in six large real datasets, and it is observed that its imputation accuracy outperforms than that of existing standard techniques. The statistical tests conducted also prove the superior results of BACO in the imputation process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.