Abstract

Machine Learning (ML) requires a certain number of features (i.e., attributes) to train the model. One of the main challenges is to determine the right number and the type of such features out of the given dataset’s attributes. It is not uncommon for the ML process to use dataset of available features without computing the predictive value of each. Such an approach makes the process vulnerable to overfit, predictive errors, bias, and poor generalization. Each feature in the dataset has either a unique predictive value, redundant, or irrelevant value. However, the key to better accuracy and fitting for ML is to identify the optimum set (i.e., grouping) of the right feature set with the finest matching of the feature’s value. This paper proposes a novel approach to enhance the Feature Engineering and Selection (eFES) Optimization process in ML. eFES is built using a unique scheme to regulate error bounds and parallelize the addition and removal of a feature during training. eFES also invents local gain (LG) and global gain (GG) functions using 3D visualizing techniques to assist the feature grouping function (FGF). FGF scores and optimizes the participating feature, so the ML process can evolve into deciding which features to accept or reject for improved generalization of the model. To support the proposed model, this paper presents mathematical models, illustrations, algorithms, and experimental results. Miscellaneous datasets are used to validate the model building process in Python, C#, and R languages. Results show the promising state of eFES as compared to the traditional feature selection process.

Highlights

  • One of the most important research directions of Machine Learning (ML) is Feature Optimization (FO) (collectively grouped as Feature Engineering (FE), Feature Selection (FS), and Filtering) [1]

  • Feature quantification and function building governed by algorithms the way we presented is not found in the literature, and the dynamic ability of such a design, as our work indicated, can be a good filler of this gap in the state of the art

  • This ensure that we address the high dimensionality issue, as when feature appears in high dimension, they tend to change their value for training mode, we determine the information gain using entropy function as: V1

Read more

Summary

Introduction

One of the most important research directions of Machine Learning (ML) is Feature Optimization (FO) (collectively grouped as Feature Engineering (FE), Feature Selection (FS), and Filtering) [1]. In the with the latest overlooked by ML techniques This includes reducing the higher dimensions into lower ones to progress and related study (See Section 2), the work proposed in this paper uses ML and mathematical extract the feature’s value. Toforadvance such developments, a unique of the orthonormal transformation to reduce general errors Another approach is Bayes error probability features is of proposed where the classifier learns to group an optimum set of features without consuming [6] to evaluate a feature set. Datasets contain several variables/features but not all computationally inexpensive and keep the accuracy higher, features should be categorized by the of them contribute towards predictive modeling Another significance of such research is to algorithm determine itself.

Parentcomputationally
Related
Groundwork of eFES
Illustration of theconceptual conceptual view of LTLT
Illustration of Feature
Illustration ofof function
Section 6.1.
ResultsThis andsection
S e ts o f 2 0 -G ro u p e d fe a tu r e s
Comparative Analysis
Conclusions
Future Works
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.