Abstract

This paper establishes asymptotic properties for spiked empirical eigenvalues of sample co- variance matrices for high-dimensional data with both cross-sectional dependence and a dependent sample structure. A new finding from the established theoretical results is that spiked empirical eigenvalues will reflect the dependent sample structure instead of the cross-sectional structure under some scenarios, which indicates that principal component analysis (PCA) may provide inaccurate inference for cross-sectional structures. An illustrated example is provided to show that some commonly used statistics based on spiked empirical eigenvalues mis-estimate the true number of common factors. As an application of high-dimensional time series, we propose a test statistic to distinguish the unit root from the factor structure and demonstrate its effective finite sample performance on simulated data. Our results are then applied to analyze OECD healthcare expenditure data and U.S. mortality data, both of which possess cross-sectional dependence as well as non-stationary temporal dependence. It is worth mentioning that we contribute to statistical justification for the benchmark paper by Lee and Carter in mortality forecasting.

Highlights

  • With the rapid development of computer science, data from various scientific areas demonstrate the following features: high dimensionality, cross-sectional dependence, and a dependent sample structure

  • The results show that the spiked empirical eigenvalues are influenced by both the cross-sectional dependence and the dependent sample structure

  • When the effective rank of Ω defined by r∗ (Ω) = Tr (Ω) / ||Ω||2 tends to infinity, our theoretical results quantify the influence of dependent sample observations on spiked empirical eigenvalues. These results suggest that implementing principal component analysis (PCA) or factor analysis by using popular methods for high-dimensional dependent data settings may lead to an incorrect conclusion for certain non-stationary models

Read more

Summary

Introduction

With the rapid development of computer science, data from various scientific areas demonstrate the following features: high dimensionality, cross-sectional dependence, and a dependent sample structure (including serial dependence in the time series setting). Mortality data indicate non-stationary temporal trending behavior Another popular dataset is healthcare expenditure data accumulated from certain countries and observed over several years, which are usually modeled as panel data in econometrics (see Gao, Xia and Zhu [18]). This dataset illustrates strong cross-sectional dependence across countries, as well as a non-stationary tendency in the temporal direction. Complicated structures among sample observations incur a new curse for high-dimensional statistical analysis, in parallel with the curse of dimensionality

Objectives
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.