Abstract

The technique introduced in this paper is a means for estimating and discovering underlying patterns for a large number of curves observed with heteroscedastic errors. Therefore, both the mean and the variance functions of each curve are assumed unknown and varying over time. The method consists of a series of steps. We transform using an orthonormal basis of functions in L 2. In the transform domain, the non-parametric regression is reduced to a means model. To estimate the means in the transform domain, we consider the class of linear or modulation estimators and proceed as in Beran and Dümbgen (R. Beran and L. Dümbgen, Modulation of estimators and confidence sets, Ann. Stat. 26(5) (1998), pp. 1826–1856.) by minimising the Stein's unbiased risk estimate. By minimising the risk over a nested subset selection of modulators, we reduce the dimensionality of the means space. We show that in the transform space, the risk estimate is asymptotically optimal in the Pinsker's minimax sense over Sobolev ellipsoids under heteroscedastic errors. Coefficient estimation and dimensionality reduction via optimal risk estimation is essential for accurate clustering membership estimation. We illustrate our technique by estimating and clustering a large number of curves both within a synthetic example and within a specific application. In this application, we analyse the research and development expenditure of a subset of companies in the Compustat Global database. We show that our method compares favourably to two alternative approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.