Abstract

SummaryChange points are a very common feature of ‘big data’ that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint.

Highlights

  • One of the most commonly encountered issues with ‘big data’ is heterogeneity

  • When collecting vast quantities of data, it is usually unrealistic to expect that stylized, traditional statistical models of independent and identically distributed (IID) observations can adequately capture the complexity of the underlying data-generating mechanism

  • We study high dimensional time series that may have change points; we consider in particular settings where, at a change point, the mean structure changes in a sparse subset of the co-ordinates

Read more

Summary

Introduction

When collecting vast quantities of data, it is usually unrealistic to expect that stylized, traditional statistical models of independent and identically distributed (IID) observations can adequately capture the complexity of the underlying data-generating mechanism. Departures from such models may take many forms, including missing data, correlated errors and data combined from multiple sources, to mention just a few. Perhaps the simplest form of nonstationarity assumes that population changes occur at a relatively small number of discrete time points If correctly estimated, these ‘change points’ can be used to partition the original data set into shorter segments, which can be analysed by using methods designed for stationary time series. The locations of these change points are often themselves of significant practical interest

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.