Abstract
Abstract Detecting changepoints in data sets with many variates is a data science challenge of increasing importance. Motivated by the problem of detecting changes in the incidence of terrorism from a global terrorism database, we propose a novel approach to multiple changepoint detection in multivariate time series. Our method, which we call SUBSET, is a model-based approach which uses a penalised likelihood to detect changes for a wide class of parametric settings. We provide theory that guides the choice of penalties to use for SUBSET, and that shows it has high power to detect changes regardless of whether only a few variates or many variates change. Empirical results show that SUBSET out-performs many existing approaches for detecting changes in mean in Gaussian data; additionally, unlike these alternative methods, it can be easily extended to non-Gaussian settings such as are appropriate for modelling counts of terrorist events.
Highlights
The canonical, one-dimensional, changepoint analysis problem has been the focus of substantial research effort for many years
Other recent substantive changepoint applications include the maintenance of safe carbon dioxide levels in spacesuits (Bekdash et al, 2020); detecting neuronal activity in calcium imaging data (Jewell et al, 2019); and assessing the effectiveness of interventions to contain the spread of the COVID-19 pandemic (Dehning et al, 2020)
To understand the behaviour of the test statistic for a single change, and obtain guidelines for choosing the constants that define our penalty function, we study its theoretical properties for the canonical change in mean problem with Gaussian noise and a common, known variance, σ2
Summary
The canonical, one-dimensional, changepoint analysis problem has been the focus of substantial research effort for many years. Much initial effort was placed on developing methodology to detect In parallel with these developments, there has been a growing adoption of changepoint methods to real world data problems in social and medical settings Within the multivariate changepoint setting, the change in mean problem has to date received the most substantial focus. Enikeeva and Harchaoui (2019) investigate the detection boundary for a change in mean in a high-dimensional asymptotic setting where the number of variates, d, the number of variates that change and the number of observations per variate increase They show that there√are two regimes depending√on whether the number of variates that change increases faster than d, or at or slower than a d rate. Approach (i) works well in the dense regime, while (ii) works well in the sparse regime
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of the Royal Statistical Society Series A: Statistics in Society
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.