Abstract

For the DNA microarray datasets, tumor classification based on gene expression profiles has drawn great attention, and gene selection plays a significant role in improving the classification performance of microarray data. In this study, an effective hybrid gene selection method based on ReliefF and Ant colony optimization (ACO) algorithm for tumor classification is proposed. First, for the ReliefF algorithm, the average distance among k nearest or k non-nearest neighbor samples are introduced to estimate the difference among samples, based on which the distances between the samples in the same class or the different classes are defined, and then it can more effectively evaluate the weight values of genes for samples. To obtain the stable results in emergencies, a distance coefficient is developed to construct a new formula of updating weight coefficient of genes to further reduce the instability during calculations. When decreasing the distance between the same samples and increasing the distance between the different samples, the weight division is more obvious. Thus, the ReliefF algorithm can be improved to reduce the initial dimensionality of gene expression datasets and obtain a candidate gene subset. Second, a new pruning rule is designed to reduce dimensionality and obtain a new candidate subset with the smaller number of genes. The probability formula of the next point in the path selected by the ants is presented to highlight the closeness of the correlation relationship between the reaction variables. To increase the pheromone concentration of important genes, a new phenotype updating formula of the ACO algorithm is adopted to prevent the pheromone left by the ants that are overwhelmed with time, and then the weight coefficients of the genes are applied here to eliminate the interference of difference data as much as possible. It follows that the improved ACO algorithm has the ability of the strong positive feedback, which quickly converges to an optimal solution through the accumulation and the updating of pheromone. Finally, by combining the improved ReliefF algorithm and the improved ACO method, a hybrid filter-wrapper-based gene selection algorithm called as RFACO-GS is proposed. The experimental results under several public gene expression datasets demonstrate that the proposed method is very effective, which can significantly reduce the dimensionality of gene expression datasets, and select the most relevant genes with high classification accuracy.

Highlights

  • Www.nature.com/scientificreports of effective samples compared to thousands or even tens of thousands of genes in microarray data, many computational methods fail to identify a small portion of important genes, and it increases learning costs and deteriorates learning performance[6,7]

  • We conduct the extensive experiments to verify the classification performance of the proposed RFACO-GS algorithm, and the simulation experiments are performed on six public gene expression datasets, which can be downloaded at http://bioinformatics.rutgers.ed/Static/Supplemens/ CompCancer/datasets

  • Of parameter setting[54,55], the detailed parameters in the RFACO-GS algorithm are described as follows: the number of ants is r = 100 in Algorithm 1, the maximum number of iterations is set as 80, and since the amount Q of pheromone on the path from ants in iterations is related to the distance between notes i and j56, one sets Q = 100 in Eq (6)

Read more

Summary

Introduction

Www.nature.com/scientificreports of effective samples compared to thousands or even tens of thousands of genes in microarray data, many computational methods fail to identify a small portion of important genes, and it increases learning costs and deteriorates learning performance[6,7]. Wang et al.[18] presented a wrapper-based gene selection method by introducing the Markov blanket technique to reduce the required wrapper evaluation time This wrapper approach achieved superior classification accuracy, but need much more computational power[19]. Some wrapper-based approaches have been provided and widely applied in bioinformatics, such as genetic algorithm[34] (GA), particle swarm optimization[35] (PSO), Ant colony optimization[36] (ACO), and so on These approaches have obtained excellent performance in gene expression data analysis, some congenital drawbacks still puzzle themselves such as excessive computational cost of GA and local optimum of PSO36. By integrating the complementary strengths of filter and wrapper approaches well, some hybrid methods have been developed to select the significant features and balance the relationship between efficiency and accuracy for selecting an optimal feature subset[43]. The experiments show that this method can effectively remove the irrelevant and redundant genes of classification data and improve the classification performance

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call