Abstract

Most of the datasets contain redundancies and inconsistencies in terms of features or instances or both. Therefore, datasets always need pre-processing before applying data mining algorithms. Feature selection is an important pre-processing task that prefers non-redundant and informative features. In addition, feature selection is a multi-objective problem with conflicting criteria like accuracy and reduction rate. This paper proposes a multi-objective CHC algorithm (a genetic algorithm with cross-generational elitist selection, heterogeneous recombination, and cataclysmic mutation) for feature selection. The algorithm, named as MOCHC-FS, combines the idea of non-dominated sorting with CHC genetic algorithm to arrive at a set of non-dominated solutions. The proposed algorithm is validated on twenty datasets available on UCI dataset repository. The results affirm that MOCHC-FS algorithm finds a range of optimal solutions that simultaneously fulfil both objectives of relatively higher accuracies and more reduction rates. Finally, a single feature subset is extracted from the set of non-dominated solutions. Accuracy and reduction rate are recorded for various experimental datasets by using KNN classification algorithm on the selected features only.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.