The Non-parametric Regression Methods in Sequence Data Study

Autcha Araveeporn

doi:10.9734/bpi/nramcs/v6/3208a

Abstract

The regression analysis is the statistical methodology for creating related functions among response variables and explanatory variables for estimating response variables given explanatory variables. However, sometimes two variables may not be the regression analysis's assumption. The non-parametric regression method is proposed to avoid this problem. The estimating parameters construct the smoothing curve with the data or called the smoothing method in the form of non-linear relationship data. This research aims to study and compare the efficiency of six non-parametric regression methods for sequence data: kernel smoothing, smoothing spline, natural cubic spline, B-spline, penalized spline, and trend filtering methods. The efficiency of the non-parametric regression method determines by the lowest average mean squared error. The smoothing parameter is used to control the smoothing performance of the curves by using the cross-validation method and defining the number of knots for fitting the curve on the closet data. The response variable character is simulated in trend, non-linear, and cycle data, where the explanatory variable is defined as the sequence data. The R program sets the sample sizes as 50, 100, 150, and 200 and repeats 500 times in each situation, while the standard deviation of error is 1, 3, and 5. The results were founded that the natural cubic spline method played the lowest average mean squared error in all cases. When the data character was considered, the cycle data had the lowest mean squared error in all sample sizes. It can be concluded that the natural cubic spline method was the performance method for sequence data.

Full Text