Abstract

In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables.

Highlights

  • It happens often that the physical or mathematical model behind an experiment or dataset is not known

  • We review a series of methods and algorithms that are used to find some subset(s) of the inputs that could possibly relate the inputs to outputs in an efficient way

  • The method that we present here has the potential to produce an optimal subset of variables, which is even interpretable in the presence of non-linear interaction between the inputs, resulting in a more efficient overall process of model selection

Read more

Summary

Introduction

It happens often that the physical or mathematical model behind an experiment or dataset is not known. Model selection (sometimes called subset selection) becomes an important feature during the data analysis endeavor. The methodology of selecting the best model from a set of inputs has constantly been examined by many authors [1]. Identifying the best subset among many variables is the most difficult part of this effort. The latter is exacerbated as the number of possible subsets grows exponentially, if the number of variables (parameters) grows linearly. There is a chance that the original input parameters to a model do not convey enough information. Some transformations of the original parameters, and interactions between them, are needed to make the data more available for information extraction

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call