Abstract

With the current trend of rapidly growing popularity of the Python programming language for machine learning applications, the gap between machine learning engineer needs and existing Python tools increases. Especially, it is noticeable for more classical machine learning fields, namely, feature selection, as the community attention in the last decade has mainly shifted to neural networks. This paper has two main purposes. First, we perform an overview of existing open-source Python and Python-compatible feature selection libraries, show their problems, if any, and demonstrate the gap between these libraries and the modern state of feature selection field. Then, we present new open-source scikit-learn compatible ITMO FS (Information Technologies, Mechanics and Optics University feature selection) library that is currently under development, explain how its architecture covers modern views on feature selection, and provide some code examples on how to use it with Python and its performance compared with other Python feature selection libraries.

Highlights

  • The “curse of dimensionality” is one of well-known machine learning problems, as described in [1]

  • Most of the existing Python machine learning tools are neural networks oriented, which results in an increasing gap between existing and implemented methods for most classical machine learning fields

  • This paper contains an overview of existing Python feature selection libraries and libraries on other languages that are compatible with Python

Read more

Summary

Introduction

The “curse of dimensionality” is one of well-known machine learning problems, as described in [1]. With the growth of data volumes and increasing effectiveness of neural networks, this problem has faded away from various fields, but it still stands in several high-dimensional data domains, namely medical care, social analysis, and bioinformatics [2,3,4,5]. For such domains, the number of objects is relatively small while the number of features can be up to several hundreds of thousands, resulting in object space sparsity and model overfitting.

Background
Traditional Feature Selection Algorithms Categorization
Hybrid and Ensembling Feature Selection Algorithms
Feature Selection Algorithms Categorization by Input Data
Default Scikit-Learn Feature Selection
Boruta Methods
MLFeatureSelection Library
FES Book Support Code
ReBATE Algorithm
MLxtend
Caret in R
MLR in R
ITMO FS library Architecture and Comparison
ITMO FS Library Usage Examples and Performance Tests
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.