Abstract
A penalized approach is proposed for performing large numbers of parallel nonparametric analyses of either of two types: restricted likelihood ratio tests of a parametric regression model versus a general smooth alternative, and nonparametric regression. Compared with naïvely performing each analysis in turn, our techniques reduce computation time dramatically. Viewing the large collection of scatterplot smooths produced by our methods as functional data, we develop a clustering approach to summarize and visualize these results. Our approach is applicable to ultra-high-dimensional data, particularly data acquired by neuroimaging; we illustrate it with an analysis of developmental trajectories of functional connectivity at each of approximately 70,000 brain locations. Supplementary materials, including an appendix and an R package, are available online.
Highlights
This paper is concerned with performing large numbers of nonparametric analyses in parallel
Our methodology has potential applications in genomics and other disciplines concerned with very high-dimensional data, but the motivation for our work comes from neuroimaging-based studies of brain development
The mgcv-based smooth is implausibly bumpy, and the reason for this is revealed in Fig. 4(d): mgcv has settled on a local maximum of the restricted maximum likelihood (REML) criterion, whereas the much smoother fit by MP reflects the global maximum
Summary
This paper is concerned with performing large numbers of nonparametric analyses in parallel. These application-specific considerations, as well as the general statistical benefits of penalized splines
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have