Abstract

The identification of discriminative features from information-rich data with the goal of clinical diagnosis is crucial in the field of biomedical science. In this context, many machine-learning techniques have been widely applied and achieved remarkable results. However, disease, especially cancer, is often caused by a group of features with complex interactions. Unlike traditional feature selection methods, which only focused on finding single discriminative features, a multilayer feature subset selection method (MLFSSM), which employs randomized search and multilayer structure to select a discriminative subset, is proposed herein. In each level of this method, many feature subsets are generated to assure the diversity of the combinations, and the weights of features are evaluated on the performances of the subsets. The weight of a feature would increase if the feature is selected into more subsets with better performances compared with other features on the current layer. In this manner, the values of feature weights are revised layer-by-layer; the precision of feature weights is constantly improved; and better subsets are repeatedly constructed by the features with higher weights. Finally, the topmost feature subset of the last layer is returned. The experimental results based on five public gene datasets showed that the subsets selected by MLFSSM were more discriminative than the results by traditional feature methods including LVW (a feature subset method used the Las Vegas method for randomized search strategy), GAANN (a feature subset selection method based genetic algorithm (GA)), and support vector machine recursive feature elimination (SVM-RFE). Furthermore, MLFSSM showed higher classification performance than some state-of-the-art methods which selected feature pairs or groups, including top scoring pair (TSP), k-top scoring pairs (K-TSP), and relative simplicity-based direct classifier (RS-DC).

Highlights

  • Identifying disease types/subtypes from biomedical data is very important to understand diseases and develop drugs, among other important functions

  • Depending on the way to combine the search of feature subsets with the construction of classification model, feature selection methods are divided into three categories: filter methods, wrapper methods, and embedded methods [10]

  • The weights of features are recalculated, and new subsets are regenerated using the weights on the following layer. e process is repeated until the terminal condition is met. e subset with the highest classification accuracy among those on the last layer is returned as the final result

Read more

Summary

Introduction

Identifying disease types/subtypes from biomedical data is very important to understand diseases and develop drugs, among other important functions. In this context, many machine-learning techniques, including support vector machine (SVM) [1], random forest (RF) [2], and k-nearestneighbor (KNN) [3], have been applied in this field with remarkable performance [4, 5]. Depending on the way to combine the search of feature subsets with the construction of classification model, feature selection methods are divided into three categories: filter methods, wrapper methods, and embedded methods [10].

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.