Comparative studies of model performance based on different data sampling methods

You Lv,Jizhen Liu,Tingting Yang

doi:10.1109/ccdc.2013.6561406

Abstract

This paper presents a comparative study on the effects of different data sampling methods to the performance of data-driven models. An engineering benchmark modeling problem is investigated, focused on which, three sampling methods, i.e. orthogonal Latin sampling, uniform design sampling and random sampling are used to generate the training data of different property. Six typical data-driven modeling techniques, which consist of artificial intelligent methods (least squares support vector machine, BP neural network and RBF neural network) and statistical methods (multiple linear regression, linear and nonlinear partial least squares regressions), are performed to make the comparison. The root mean square error (RMSE), R square ( ) and mean relative error (MRE) values are taken as the comparison criteria. The results reveal that data sampling and data property play a very key role in establishing an accurate data-driven model.

Full Text