Abstract

Cancer is a dangerous disease that causes death worldwide. Discovering few genes relevant to one cancer disease can result in effective treatments. The challenge associated with the Microarray datasets is its high dimensionality; the huge number of features compared to the modest number of samples in these datasets. Recent research efforts attempted to reduce this high-dimensionality using different feature selection techniques. This paper presents an ensemble feature selection technique based on t-test and genetic algorithm. After preprocessing the data using t-test, a Nested Genetic Algorithm, namely Nested-GA, is used to get the optimal subset of features by combining data from two different datasets. Nested-GA consists of two Nested Genetic Algorithms (outer and inner) that run on two different kinds of datasets. The Outer Genetic Algorithm (OGA-SVM) works on Microarray gene expression datasets, whereas the Inner Genetic Algorithm (IGA-NNW) runs on DNA Methylation datasets. Nested-GA is performed on a colon cancer dataset with 5-fold cross validation. After applying Nested-GA, the Incremental Feature Selection (IFS) strategy is used to get the smallest optimal genes subset. The genes subset has been validated on an independent dataset resulting in 99.9% classification accuracy. Consequently, the biological significance of the resulting optimal genes is validated using Enrichment Analysis. Moreover, the results of Nested-GA have been compared to the results of other feature selection algorithms that have been run on either Gene Expression or DNA Methylation datasets. From the experimental results, Nested-GA showed the highest classification performance with a small optimal feature subset compared to the other algorithms. Furthermore, by running Nested-GA on lung cancer datasets that contain two different cancer subtypes, it resulted in significantly better classification accuracy (98.4%) compared to the accuracy of a previous research (84.6%) that utilized lung cancer DNA-Methylation data only.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call