Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

Muhammad Fahim Uddin,Samir Hamada,Jeongkyu Lee,Syed Rizvi

doi:10.3390/app8040646

Muhammad Fahim Uddin, Samir Hamada + Show 2 more

Open Access

https://doi.org/10.3390/app8040646

Copy DOI

Abstract

Machine Learning (ML) requires a certain number of features (i.e., attributes) to train the model. One of the main challenges is to determine the right number and the type of such features out of the given dataset’s attributes. It is not uncommon for the ML process to use dataset of available features without computing the predictive value of each. Such an approach makes the process vulnerable to overfit, predictive errors, bias, and poor generalization. Each feature in the dataset has either a unique predictive value, redundant, or irrelevant value. However, the key to better accuracy and fitting for ML is to identify the optimum set (i.e., grouping) of the right feature set with the finest matching of the feature’s value. This paper proposes a novel approach to enhance the Feature Engineering and Selection (eFES) Optimization process in ML. eFES is built using a unique scheme to regulate error bounds and parallelize the addition and removal of a feature during training. eFES also invents local gain (LG) and global gain (GG) functions using 3D visualizing techniques to assist the feature grouping function (FGF). FGF scores and optimizes the participating feature, so the ML process can evolve into deciding which features to accept or reject for improved generalization of the model. To support the proposed model, this paper presents mathematical models, illustrations, algorithms, and experimental results. Miscellaneous datasets are used to validate the model building process in Python, C#, and R languages. Results show the promising state of eFES as compared to the traditional feature selection process.

Highlights

One of the most important research directions of Machine Learning (ML) is Feature Optimization (FO) (collectively grouped as Feature Engineering (FE), Feature Selection (FS), and Filtering) [1]
Feature quantification and function building governed by algorithms the way we presented is not found in the literature, and the dynamic ability of such a design, as our work indicated, can be a good filler of this gap in the state of the art
This ensure that we address the high dimensionality issue, as when feature appears in high dimension, they tend to change their value for training mode, we determine the information gain using entropy function as: V1

Summary

Introduction

One of the most important research directions of Machine Learning (ML) is Feature Optimization (FO) (collectively grouped as Feature Engineering (FE), Feature Selection (FS), and Filtering) [1]. In the with the latest overlooked by ML techniques This includes reducing the higher dimensions into lower ones to progress and related study (See Section 2), the work proposed in this paper uses ML and mathematical extract the feature’s value. Toforadvance such developments, a unique of the orthonormal transformation to reduce general errors Another approach is Bayes error probability features is of proposed where the classifier learns to group an optimum set of features without consuming [6] to evaluate a feature set. Datasets contain several variables/features but not all computationally inexpensive and keep the accuracy higher, features should be categorized by the of them contribute towards predictive modeling Another significance of such research is to algorithm determine itself.

Parentcomputationally

Groundwork of eFES

Illustration of theconceptual conceptual view of LTLT

Illustration of Feature

Illustration ofof function

Section 6.1.

ResultsThis andsection

S e ts o f 2 0 -G ro u p e d fe a tu r e s

Comparative Analysis

Conclusions

Future Works

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Apr 20, 2018
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Causal feature selection method based on extended Markov blanket
Yinghan Hong ... Guizhen Mai
International Journal of Wireless and Mobile Computing | VOL. 15
Yinghan Hong, et. al.Yinghan Hong ... Guizhen Mai
01 Jan 2018
International Journal of Wireless and Mobile Computing | VOL. 15

Survival Prognostic Modeling Using PET/CT Image Radiomics: The Quest for Optimal Approaches
M Amini ... G Mehri-Kakavand
-
M Amini, et. al.M Amini ... G Mehri-Kakavand
16 Oct 2021
16 Oct 2021

Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
Hela Elmannai ... Hager Saleh
Diagnostics | VOL. 13
Hela Elmannai, et. al.Hela Elmannai ... Hager Saleh
21 Apr 2023
Diagnostics | VOL. 13

A Machine Learning Model for Predicting Heart Disease using Ensemble Methods
Jasjit Singh Samagh ... Dilbag Singh
International Journal of Advanced Computer Science and Applications | VOL. 13
Jasjit Singh Samagh, et. al.Jasjit Singh Samagh ... Dilbag Singh
01 Jan 2021
International Journal of Advanced Computer Science and Applications | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences