Abstract
Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.
Highlights
Clustering has become crucial in several areas of scientific research, and it has a significant impact in the theoretical development and applied research in several scientific disciplines [1,2,3]
We introduce three clustering algorithms: The clustering based on Artificial Bee Colony (CABC), the clustering based on Firefly Algorithm (CFA), and the clustering based on Novel Bat Algorithm (CNBA)
Our proposal is emerging as a valuable contribution to the field, because most of the proposed algorithms for clustering operate only over numerical data, and these algorithms are not capable of solving problems that are relevant for human activities and whose data are described by numeric attributes, but by mixed and incomplete data
Summary
Clustering has become crucial in several areas of scientific research, and it has a significant impact in the theoretical development and applied research in several scientific disciplines [1,2,3]. The lack of information is denoted with the symbol “?” in many datasets available in public repositories, such as the UCI Machine Learning Repository [17] or the KEEL Project Repository [18,19] Following this criterion, the value “?” is included in the domain of definition of the dataset feature; the description of an incomplete object has the value “?” in place of the missing value. It is entirely possible that hybrid features, while presenting missing values for some patterns describe a dataset In this instance, the challenge faced by classification algorithms is even greater, since they need to handle both problems
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.