Abstract

The use of description length principles to select an appropriate number of basis functions for functional data is investigated. A flexible definition of the dimension of a random function that is constructed directly from the Karhunen–Loève expansion of the observed process or data generating mechanism is provided. The results obtained show that although the classical, principle component variance decomposition technique will behave in a coherent manner, in general, the dimension chosen by this technique will not be consistent in the conventional sense. Two description length criteria are described. Both of these criteria are proved to be consistent and it is shown that in low noise settings they will identify the true finite dimension of a signal that is embedded in noise. Two examples, one from mass spectroscopy and the other from climatology, are used to illustrate the basic ideas. The application of different forms of the bootstrap for functional data is also explored and used to demonstrate the workings of the theoretical results.

Highlights

  • In the analysis of functional data, wherein each observation is a curve or image, it is commonly supposed that the random curves or functions X are sampled from a stochastic process in tions on the intervalL2[0,τ ]. [0, τ ], Here, L2[0,τ] with inner is the Hilbert space product < f, g >=o∫fτ square integrable funcf (t)g(t)dt for any two functions f, g ∈ L2[0,τ] and induced squared norm ∥ · ∥2 =< ·, · >

  • By couching the concept of dimensionality directly in terms of the observed process our definition obviates the need to explicitly posit the existence of separate signal and noise components, data generating mechanisms that consist of a signal embedded in noise are encompassed as a special case and we show that our definition will coincide with the finite dimension of the signal in low noise settings

  • X will be said to be a process of dimension kα at signal-to-noise ratio (SN R) level α/(1 − α)

Read more

Summary

Introduction

In the analysis of functional data, wherein each observation is a curve or image, it is commonly supposed that the random curves or functions X are sampled from a stochastic process in tions on the interval. Hall and Vial assume a signal-plus-noise model for the observed process and consider determining k by examining the null hypothesis that the signal has fewer than k dimensions They show that for such a model the noise will be confounded with the signal, and suggest that the intrinsic impossibility of estimating the full extent of the noise that results from this confounding means that conventional hypothesis testing techniques will not be effective. They use the bootstrap to construct a lower bound for the un-confounded part of the noise variance and conclude that the assumed number of dimensions, k, is too small if the lower bound seems too large.

Basic Data Structures
Signal–plus–Noise Representations
The Signal-to-Noise Ratio and Dimension
Some Preliminary Results
Variance decomposition
Description Length
Illustrations
The Bootstrap
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.