Abstract

Microarray gene expression data provide a prospective way to diagnose disease and classify cancer. However, in bioinformatics, the gene selection problem, i.e., how to select the most informative genes from thousands of genes, remains challenging. This problem is a specific feature selection problem with high-dimensional features and small sample sizes. In this paper, a two-stage method combining a filter feature selection method and a wrapper feature selection method is proposed to solve the gene selection problem. In contrast to common methods, the proposed method models the gene selection problem as a multiobjective optimization problem. Both stages employ the same multiobjective differential evolution (MODE) as the search strategy but incorporate different objective functions. The three objective functions of the filter method are mainly based on mutual information. The two objective functions of the wrapper method are the number of selected features and the classification error of a naive Bayes (NB) classifier. Finally, the performance of the proposed method is tested and analyzed on six benchmark gene expression datasets. The experimental results verified that this paper provides a novel and effective way to solve the gene selection problem by applying a multiobjective optimization algorithm.

Highlights

  • Gene selection is an important issue in bioinformatics [1]

  • Two single-objective wrapper methods based on differential evolution (DE) are proposed in this stage. ese two single-objective methods serve as the baseline to test the performance of the multiobjective differential evolution (MODE)-based wrapper method and help us investigate the following: (1) whether it is necessary to consider the number of selected features in the wrapper method and (2) whether the method based on multiobjective optimization outperforms the methods based on single-objective optimization

  • Since MODE obtains a set of nondominated solutions in each independent run, five independent sets of nondominated solutions with three objectives are generated

Read more

Summary

Introduction

Gene selection is an important issue in bioinformatics [1]. A gene is the basic functional unit of heredity. En, the gene products dictate cellular function. Erefore, abnormal gene expression is usually correlated with different types of disease, such as cancer [3]. Many diseases correspond to unique gene expression profiles that can be revealed by DNA microarray technology [4]. Microarray data corresponding to a certain disease consist of a set of biological samples. The expression of thousands of genes at each position can be measured. Microarray data are usually in the form of a matrix. It is not an easy task for researchers to check which genes are responsible for a given disease because of the high dimensionality of microarray data. It is not an easy task for researchers to check which genes are responsible for a given disease because of the high dimensionality of microarray data. us, determining how to select the most significant genes effectively for further analysis becomes urgent and vital

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call