A non-linear data mining parameter selection algorithm for continuous variables.

Peyman Tavallali,Marianne Razavi,Sean Brady

doi:10.1371/journal.pone.0187676

Peyman Tavallali, Marianne Razavi + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0187676

Copy DOI

Journal: PLOS ONE	Publication Date: Nov 13, 2017
Citations: 9	License type: CC BY 4.0

Affiliation: California Institute of Technology

Abstract

In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables.

Highlights

It happens often that the physical or mathematical model behind an experiment or dataset is not known
We review a series of methods and algorithms that are used to find some subset(s) of the inputs that could possibly relate the inputs to outputs in an efficient way
The method that we present here has the potential to produce an optimal subset of variables, which is even interpretable in the presence of non-linear interaction between the inputs, resulting in a more efficient overall process of model selection

Summary

Introduction

It happens often that the physical or mathematical model behind an experiment or dataset is not known. Model selection (sometimes called subset selection) becomes an important feature during the data analysis endeavor. The methodology of selecting the best model from a set of inputs has constantly been examined by many authors [1]. Identifying the best subset among many variables is the most difficult part of this effort. The latter is exacerbated as the number of possible subsets grows exponentially, if the number of variables (parameters) grows linearly. There is a chance that the original input parameters to a model do not convey enough information. Some transformations of the original parameters, and interactions between them, are needed to make the data more available for information extraction

Objectives

Methods

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A non-linear data mining parameter selection algorithm for continuous variables.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Enhancing Efficacy of Machine Learning Model Selection Process for Big Data Science Projects by Introducing an Adaptive Method Based on Dynamic Factors
Arnav Goenka
International Journal of Research in Science and Technology | VOL. 13
Arnav GoenkaArnav Goenka
01 Jan 2023
International Journal of Research in Science and Technology | VOL. 13

Neurodegenerative diseases categorization by applying the automatic model selection and hyperparameter optimization method
Mirta Fuentes-Ramos ... Iván-Vladimir Meza-Ruiz
Journal of Intelligent & Fuzzy Systems | VOL. 42
Mirta Fuentes-Ramos, et. al.Mirta Fuentes-Ramos ... Iván-Vladimir Meza-Ruiz
31 Mar 2022
Journal of Intelligent & Fuzzy Systems | VOL. 42

Heuristic Optimization Methods for Dynamic Panel Data Model Selection: Application on the Russian Innovative Performance
Ivan Savin ... Peter Winker
SSRN Electronic Journal | VOL. -
Ivan Savin, et. al.Ivan Savin ... Peter Winker
01 Jan 2009
SSRN Electronic Journal | VOL. -

A Stepwise AIC Method for Variable Selection in Linear Regression
Toshie Yamashita ... Ryotaro Kamimura
Communications in Statistics - Theory and Methods | VOL. 36
Toshie Yamashita, et. al.Toshie Yamashita ... Ryotaro Kamimura
03 Oct 2007
Communications in Statistics - Theory and Methods | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A non-linear data mining parameter selection algorithm for continuous variables.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE