Abstract
With the current trend of rapidly growing popularity of the Python programming language for machine learning applications, the gap between machine learning engineer needs and existing Python tools increases. Especially, it is noticeable for more classical machine learning fields, namely, feature selection, as the community attention in the last decade has mainly shifted to neural networks. This paper has two main purposes. First, we perform an overview of existing open-source Python and Python-compatible feature selection libraries, show their problems, if any, and demonstrate the gap between these libraries and the modern state of feature selection field. Then, we present new open-source scikit-learn compatible ITMO FS (Information Technologies, Mechanics and Optics University feature selection) library that is currently under development, explain how its architecture covers modern views on feature selection, and provide some code examples on how to use it with Python and its performance compared with other Python feature selection libraries.
Highlights
The “curse of dimensionality” is one of well-known machine learning problems, as described in [1]
Most of the existing Python machine learning tools are neural networks oriented, which results in an increasing gap between existing and implemented methods for most classical machine learning fields
This paper contains an overview of existing Python feature selection libraries and libraries on other languages that are compatible with Python
Summary
The “curse of dimensionality” is one of well-known machine learning problems, as described in [1]. With the growth of data volumes and increasing effectiveness of neural networks, this problem has faded away from various fields, but it still stands in several high-dimensional data domains, namely medical care, social analysis, and bioinformatics [2,3,4,5]. For such domains, the number of objects is relatively small while the number of features can be up to several hundreds of thousands, resulting in object space sparsity and model overfitting.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.