Abstract

Standard Regression models are presented with n samples from an input space X that is composed of observational data of the form (xi, y(x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</sub> )), i = 1...n where each x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</sub> denotes a k-dimensional input vector of design variables and y is the response. When k ≫ n, high variance and over-fitting become a major concern. In this paper we propose a novel approach to mitigate this problem by transforming the input vectors into new smaller vectors (called Z set) using only a set of simple statistical moments. Genetic Algorithm (GA) has been used to evolve a transformation procedure. It is used to optimise an optimal sequence of statistical moments and their input parameters. We used Linear Regression (LR) as an example to quantify the quality of the evolved transformation procedure. Empirical evidences, collected from benchmark functions and real-world problems, demonstrate that the proposed transformation approach is able to dramatically improve LR generalisation and make it outperform other state-of-the-art regression models such as Genetic Programming, Kriging, and Radial Basis Functions Networks. In addition, we present an analysis to shed light on the most important statistical moments that are useful for the transformation process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call