Abstract
Functional data analysis techniques, such as penalized splines, have become common tools used in a variety of applied research settings. Penalized spline estimators are frequently used in applied research to estimate unknown functions from noisy data. The success of these estimators depends on choosing a tuning parameter that provides the correct balance between fitting and smoothing the data. Several different smoothing parameter selection methods have been proposed for choosing a reasonable tuning parameter. The proposed methods generally fall into one of three categories: cross-validation methods, information theoretic methods, or maximum likelihood methods. Despite the well-known importance of selecting an ideal smoothing parameter, there is little agreement in the literature regarding which method(s) should be considered when analyzing real data. In this paper, we address this issue by exploring the practical performance of six popular tuning methods under a variety of simulated and real data situations. Our results reveal that maximum likelihood methods outperform the popular cross-validation methods in most situations—especially in the presence of correlated errors. Furthermore, our results reveal that the maximum likelihood methods perform well even when the errors are non-Gaussian and/or heteroscedastic. For real data applications, we recommend comparing results using cross-validation and maximum likelihood tuning methods, given that these methods tend to perform similarly (differently) when the model is correctly (incorrectly) specified.
Highlights
Functional data analysis (FDA) considers the analysis of data that are realizations of a functional process [1,2,3]
The results reveal that, for all combinations of η and ρ, all of the methods tend to result in better function recovery as n increases, which was expected
The interesting finding is that the maximum likelihood-based methods (REML and ML) tend to produce root mean squared error (RMSE) values that are similar to or smaller than the RMSE values produced by the cross-validation-based methods (OCV and generalized cross-validation (GCV)) and the information theory-based methods (AIC and Bayesian information criterion (BIC))
Summary
Functional data analysis (FDA) considers the analysis of data that are (noisy) realizations of a functional process [1,2,3] Such data can be found in many fields [4,5] and are becoming more common in the biomedical and social sciences, e.g., in the form of ecological momentary assessments [6,7] collected using smart phone apps. The nonparametric regression model considered in this paper can be interpreted as a functional extension of the simple model Y = μ + e, which is assumed for a one sample location test. Note that spline smoothers assume a nonparametric regression model of the form Yi = η ( Xi ) + ei , where η (·) is the unknown mean function
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.