Abstract

In recent times, thanks to the availability of a large quantity of data coming from the industrial process, several techniques based on a data-driven approach could be developed. Between all the data-driven techniques, as Principle Component Regression, Support Vector Machines, Artificial Neural Networks, Neuro-Fuzzy Systems, and many others, the data on which they rely should be analyzed to find correlations and dependencies that could improve their design. For this reason, the Input variable Selection (IVS) process has become of great interest in the recent period. The classical IVS relies on classical statistics, as Pearson coefficients, able to discover linear dependencies among data; today, due to the significant amount of data available, the challenge of also discovering non-linear dependencies appears to be a necessary skill, mainly for the design and development of a neural network. This paper proposes the use of a novel statistical tool named Maximal Information Coefficient (MIC) for developing an IVS procedure able to discover dependencies in a considerable dataset and guide the IVS designer to the selection of input variables in a data-driven application. As a case study, the procedure will be applied to a real application developed in the context of the Swedish forest industry, in order to choose the input variables of a neural network able to estimate the timber bundles volume, which represents an expensive parameter to measure in this context.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call