Abstract

Gene expression microarray technologies have enabled the biological classification of the expression levels of thousands to tens of thousands of genes. However, most genes in a DNA microarray experiment are not relevant from the classification viewpoint. With the goal of finding the target gene set faster and more accurately for microarray-based cancer classification, this study investigated the existing mainstream technologies of gene selection based on a hybrid filter-wrapper model. On this basis, we present a novel hybrid gene selection algorithm, named TRF-WGHC (Top-Ranking Filter and Wrapper-based Greedy Hill-Climbing). The main advantages of TRF-WGHC are its simplicity and effectiveness. TRF-WGHC selects genes over two steps. First, by using a specific ranking metric, it selects a small top-n percentage of genes and eliminates those genes with scores smaller than the threshold. Second, it searches for the optimal subset of the remaining genes using the augmented greedy hill-climbing algorithm. We performed comprehensive experiments to compare TRF-WGHC with other state-of-the-art algorithms on 18 publicly available microarray expression datasets. Theoretical analysis and experimental results prove that the TRF-WGHC is a simple but extremely effective gene selection algorithm for the classification of microarray datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call