Abstract
Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson’s linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures convergence, diversity, coverage and ONVG. The obtained results show superiority of the IMBGSAFS over its competitors. An external clustering validity index F-measure also establish the above finding. As the decision maker picks only a single solution from the set of trade-off solutions, we employee the F-measure to select a final single solution from the external archive. The quality of final solution achieved by IMBGSAFS is superior over competitors in terms of clustering accuracy and/or smaller subset size.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.