Abstract

Gene expression data are usually redundant, and only a subset of them presents distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in bioinformatics. In this paper, a multiobjective binary differential evolution method (MOBDE) is proposed to select a small subset of informative genes relevant to the classification. In the proposed method, firstly, the Fisher-Markov selector is used to choose top features of gene expression data. Secondly, to make differential evolution suitable for the binary problem, a novel binary mutation method is proposed to balance the exploration and exploitation ability. Thirdly, the multiobjective binary differential evolution is proposed by integrating the summation of normalized objectives and diversity selection into the binary differential evolution algorithm. Finally, the MOBDE algorithm is used for feature selection, and support vector machine (SVM) is used as the classifier with the leave-one-out cross-validation method (LOOCV). In order to show the effectiveness and efficiency of the algorithm, the proposed method is tested on ten gene expression datasets. Experimental results demonstrate that the proposed method is very effective.

Highlights

  • Gene expression data are characterized by thousands of and even tens of thousands of measured genes on only a few tissue samples, which gives rise to difficulties for many classifiers [1,2]

  • Results for 10 runs are listed in this table

  • The objective of this study is to provide a multiobjective optimization method for feature selection

Read more

Summary

Introduction

Gene expression data are characterized by thousands of and even tens of thousands of measured genes on only a few tissue samples, which gives rise to difficulties for many classifiers [1,2]. By using the filter and wrapper techniques, many feature selection methods [5,6,7,8] have been proposed to optimize the efficiency of the search and selection process. A novel correlation-based memetic framework (MA-C), which is a combination of genetic algorithm (GA) and local search (LS) using correlation-based filter ranking, was proposed [9]. The local filter method used here fine-tunes the population of GA solutions by adding or deleting features based on the symmetrical uncertainty (SU) measure. In [11], Xue B. et al propose three new initialization strategies and three new personal best and global best updating mechanisms in particle swarm optimization to develop novel feature selection approaches with the goals of maximizing the classification performance, minimizing the number of features and reducing the computational time

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call