Abstract

Consider longitudinal data $$x_{ij},$$ with $$i=1,...,n$$ and $$j=1,...,p,$$ where $$x_{ij}$$ is the observation of the smooth random function $$X_{i}\left( .\right) $$ at time $$t_{j}.$$ The goal of this paper is to develop a parsimonious representation of the data by a linear combination of a set of $$q<p$$ smooth functions $$H_{k}\left( .\right) $$ ($$k=1,..,q)$$ in the sense that $$x_{ij}\approx \mu _{j}+\sum _{k=1}^{q}\beta _{ki}H_{k}\left( t_{j}\right) .$$ This representation should be resistant to atypical $$X_{i}$$’s (“case contamination”), resistant to isolated gross errors at some cells (i, j) (”cell contamination”), and applicable when some of the $$x_{ij}$$ are missing (”irregularly spaced—or ’incomplete’—data”). Two approaches will be proposed for this problem. One deals with the three requirements stated above, and is based on ideas similar to MM-estimation (Yohai in Ann Stat 15:642–656, 1987). The other is a simple and fast estimator which can be applied to complete data with case- and cellwise contamination, and is based on applying a standard robust principal components estimate and smoothing the principal directions. Experiments with real and simulated data suggest that with complete data the simple estimator outperforms its competitors, while the MM estimator is competitive for incomplete data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call