Feature Selection Techniques on Thyroid, Hepatitis, and Breast Cancer Datasets

Mohammad Ashraf ,Dat Tran ,Girija Chetty

doi:10.4156/ijmia.vol3.issue1.1

Abstract

Last century, the challenge was to develop new technologies that store large amount of data. Recently, the challenges are to effectively utilize the incredible amount of data and to obtain knowledge that benefits business, scientific, and government transactions by using subset of features rather than the whole features in the dataset. In the present paper, we have focused on feature selection techniques as a method to gain high quality attributes to enhance the mining process. Feature selection techniques touch all disciplines that require knowledge discovery from large data. In our study, we made a comparison between benchmark feature selection methods based on three well-known datasets and three well-recognized machine learning algorithms. The study found that feature selection methods are capable to improve the performance of learning algorithms. However, no single feature selection methods that best satisfy all datasets and learning algorithms. Therefore, machine learning researchers should understand the nature of datasets and learning algorithms characteristics in order to obtain better outcome. Overall, correlation based feature selection (CFS) and consistency based subset evaluation (CB) performed better than information gain, symmetrical uncertainty, Relief (RF), and principle components analysis (PC).

Full Text