Abstract

The prediction of out-of-sample values is an interesting problem in any regression model. In the context of penalized smoothing using a mixed-model reparameterization, a general framework has been proposed for predicting in additive models but without interaction terms. The aim of this paper is to generalize this work, extending the methodology proposed in the multidimensional case, to models that include interaction terms, i.e., when prediction is carried out in a multidimensional setting. Our method fits the data, predicts new observations at the same time, and uses constraints to ensure a consistent fit or impose further restrictions on predictions. We have also developed this method for the so-called smooth-ANOVA model, which allows us to include interaction terms that can be decomposed into the sum of several smooth functions. We also develop this methodology for the so-called smooth-ANOVA models, which allow us to include interaction terms that can be decomposed as a sum of several smooth functions. To illustrate the method, two real data sets were used, one for predicting the mortality of the U.S. population in a logarithmic scale, and the other for predicting the aboveground biomass of Populus trees as a smooth function of height and diameter. We examine the performance of interaction and the smooth-ANOVA model through simulation studies.

Highlights

  • For more than twenty years, use of penalized splines with B-splines bases [1] has become one of the most popular smoothing techniques in many application fields, see [2] for a detailed review, and from a theoretical perspective [3,4].The possibility to reformulate a P-spline as a maximum likelihood estimator and best linear unbiased predictors (BLUPs) in a mixed model framework have led to a wider generalization of the method to longitudinal data, spatial smoothing, correlated errors and Bayesian approaches [5,6]

  • The out-of-sample prediction will be performed in the context of a mixed model, we introduced the method in the original P-spline formulation because the reparameterization required for out-of-sample prediction will be based on this definition

  • The restricted interaction model is better than the unrestricted model in the out-of-sample prediction, the difference between the two models is apparent for most scenarios

Read more

Summary

Introduction

The possibility to reformulate a P-spline as a maximum likelihood estimator and best linear unbiased predictors (BLUPs) in a mixed model framework have led to a wider generalization of the method to longitudinal data, spatial smoothing, correlated errors and Bayesian approaches [5,6]. P-splines have been used to model mortality life tables or spatio-temporal data [7,8,9]. This method is used to fit data, and to obtain out-of-sample predictions. [10] used P-splines to forecast mortality life tables and [11,12] used them in a three-dimensional spatiotemporal model for predicting the risk of cancer death in the few years. There are some important issues that still need to be addressed, for example, simultaneous prediction in more than one dimension, or in models that include smooth main effects and interactions terms

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call