Abstract
The common challenge for machine learning and data mining tasks is the curse of High Dimensionality. Feature selection reduces the dimensionality by selecting the relevant and optimal features from the huge dataset. In this research work, a clustering and genetic algorithm based feature selection (CLUST-GA-FS) is proposed that has three stages namely irrelevant feature removal, redundant feature removal, and optimal feature generation. The performance of the feature selection algorithms are analyzed using the parameters like classification accuracy, precision, recall and error rate. Recently, an increasing attention is given to the stability of feature selection algorithms which is an indicator that requires that similar subsets of features are selected every time the algorithm is executed on the same dataset. This work analyzes the stability of the algorithm on four publicly available dataset using stability measurements Average normal hamming distance(ANHD), Dices coefficient, Tanimoto distance, Jaccards index and Kuncheva index.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.