Abstract

Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques. It also plays a critical role in different biological applications such as the identification of biomarkers. Although many feature/gene selection algorithms and methods have been introduced, they may suffer from problems such as parameter tuning or low level of performance. To tackle such limitations, in this study, a universal wrapper approach is introduced based on our introduced optimization algorithm and the genetic algorithm (GA). In the proposed approach, candidate solutions have variable lengths, and a support vector machine scores them. To show the usefulness of the method, thirteen classification and regression-based datasets with different properties were chosen from various biological scopes, including drug discovery, cancer diagnostics, clinical applications, etc. Our findings confirmed that the proposed method outperforms most of the other currently used approaches and can also free the users from difficulties related to the tuning of various parameters. As a result, users may optimize their biological applications such as obtaining a biomarker diagnostic kit with the minimum number of genes and maximum separability power.

Highlights

  • Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques

  • (iii) Ensemble methods: For the feature selection (FS), ensemble methods create a learner such as a decision ­tree[33] and selects features in such a way that the learner chooses them for generating a ­model[34,35]. Due to their greedy nature, ensemble methods may fall into local optima solutions and do not reach the optimal result. To deal with this limitation, we introduce the world competitive contests (WCC) algorithm, which features a low probability of falling into local optima

  • Several datasets with diverse properties have been selected from various sources such as the machine learning repository developed at the University of California Irvine (UCI)[45] and published seminar literature sources

Read more

Summary

Introduction

Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques It plays a critical role in different biological applications such as the identification of biomarkers. Many feature/gene selection algorithms and methods have been introduced, they may suffer from problems such as parameter tuning or low level of performance To tackle such limitations, in this study, a universal wrapper approach is introduced based on our introduced optimization algorithm and the genetic algorithm (GA). (iii) Positive ­features[8], which play a determinant role in distinguishing between samples and enhance the performance of a learner For such features, the feature selection (FS) methods need to be applied since some of the features may have redundant roles as others. A large set of them may be represented by a small set

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call