Abstract

A key challenge before classification can take place is feature selection. An effective feature selection method would increase classification accuracy and simultaneously reduce computation costs and time. A variety of filter approaches, along with different search algorithms, were considered in this study. Five traditional classifiers were evaluated on the selected gene subsets: Random Forest, Sequential minimal optimization algorithm, Naive Bayes, Decision Trees, and K-Nearest Neighbour. The datasets chosen for this analysis are the microarray gene expression data of two types of cancers: Acute Lymphocytic Leukaemia (ALL)/Acute Myeloid Leukaemia (AML) and Lung cancer. According to the experimental results, a fuzzy rough subset combined with Genetic Search selects optimal relevant gene subsets and produces significantly good classifier accuracy. Compared to classical classifiers described here, this research finds that Random Forest classifiers yield 94.33% on the raw dataset and 100% classifier accuracy after applying feature selection methods. Utilizing conventional methods like Precision, Recall, F-Score, and Region of Characteristics, MCC Matthews correlation coefficient, results are validated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call