Abstract

BackgroundClassifying cancers by gene selection is among the most important and challenging procedures in biomedicine. A major challenge is to design an effective method that eliminates irrelevant, redundant, or noisy genes from the classification, while retaining all of the highly discriminative genes.ResultsWe propose a gene selection method, called local hyperplane-based discriminant analysis (LHDA). LHDA adopts two central ideas. First, it uses a local approximation rather than global measurement; second, it embeds a recently reported classification model, K-Local Hyperplane Distance Nearest Neighbor(HKNN) classifier, into its discriminator. Through classification accuracy-based iterations, LHDA obtains the feature weight vector and finally extracts the optimal feature subset. The performance of the proposed method is evaluated in extensive experiments on synthetic and real microarray benchmark datasets. Eight classical feature selection methods, four classification models and two popular embedded learning schemes, including k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), Support Vector Machine (SVM) and Random Forest are employed for comparisons.ConclusionThe proposed method yielded comparable to or superior performances to seven state-of-the-art models. The nice performance demonstrate the superiority of combining feature weighting with model learning into an unified framework to achieve the two tasks simultaneously.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0629-6) contains supplementary material, which is available to authorized users.

Highlights

  • Classifying cancers by gene selection is among the most important and challenging procedures in biomedicine

  • The filter methods, such as Relief [9] and Mutual Information [10], identify feature subsets from the original feature set based on specific evaluation criteria that are independent of a learning algorithm

  • The performance of local hyperplane-based discriminant analysis (LHDA) was evaluated in extensive experiments on various datasets

Read more

Summary

Introduction

Classifying cancers by gene selection is among the most important and challenging procedures in biomedicine. DNA microarray datasets can simultaneously determine the expression levels of thousands of genes [1] For application purposes, these gene expression data must be classified into various categories [2]. The sample size of the genetic dataset is usually much smaller than the number of genes, which extends into thousands or even tens of thousands [5] Such limited availability of high-dimensional samples is problematic for standard classification models. Feature selection techniques can be broadly categorized into three groups; filter, wrapper and hybrid [7, 8] The filter methods, such as Relief [9] and Mutual Information [10], identify feature subsets from the original feature set based on specific evaluation criteria that are independent of a learning algorithm. Filter methods yield poor performance because they ignore classifier interactions, whereas wrapper methods are very computationally

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.