Abstract

BackgroundModeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers.ResultsWe propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB).ConclusionExperiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms.

Highlights

  • Modeling high-dimensional data involving thousands of variables is important for gene expression profiling experiments,it remains a challenging task

  • We tested it on nine medium to large benchmark microarray datasets, which were all used to investigate the relationship between cancers and gene expression

  • In this paper, we proposed a new feature weighting scheme to overcome the common drawbacks of the RELIEF family

Read more

Summary

Introduction

Modeling high-dimensional data involving thousands of variables is important for gene expression profiling experiments, ,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. Feature weighting is an important step in the preprocessing of data, especially in gene selection for cancer classification. Reducing the dimensionality of the feature space and selecting the most informative genes for effective classification with new or. Wrapper methods use the predictive accuracy of predetermined classification algorithms (called base classifiers), such as the support vector machine (SVM), as the criterion for determining the goodness of a subset of features [1,2]. Filter methods select features according to discriminant criteria based on the characteristics of the data, independent of any classification algorithms [3,4,5]. Used discriminant criteria include entropy measurements [6], Fisher ratio measurements [7], mutual information measurements [8,9,10], and RELIEF-based measurements [11,12]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call