Tidychangepoint: a unified framework for analyzing changepoint detection in univariate time series
Tidychangepoint: a unified framework for analyzing changepoint detection in univariate time series
- Research Article
7
- 10.1007/s11771-013-1466-2
- Jan 1, 2013
- Journal of Central South University
The detection of outliers and change points from time series has become research focus in the area of time series data mining since it can be used for fraud detection, rare event discovery, event/trend change detection, etc. In most previous works, outlier detection and change point detection have not been related explicitly and the change point detections did not consider the influence of outliers, in this work, a unified detection framework was presented to deal with both of them. The framework is based on ALARCON-AQUINO and BARRIA’s change points detection method and adopts two-stage detection to divide the outliers and change points. The advantages of it lie in that: firstly, unified structure for change detection and outlier detection further reduces the computational complexity and make the detective procedure simple; Secondly, the detection strategy of outlier detection before change point detection avoids the influence of outliers to the change point detection, and thus improves the accuracy of the change point detection. The simulation experiments of the proposed method for both model data and actual application data have been made and gotten 100% detection accuracy. The comparisons between traditional detection method and the proposed method further demonstrate that the unified detection structure is more accurate when the time series are contaminated by outliers.
- Research Article
119
- 10.1080/01621459.2015.1050493
- Oct 2, 2015
- Journal of the American Statistical Association
This article reviews some recent developments on the inference of time series data using the self-normalized approach. We aim to provide a detailed discussion about the use of self-normalization in different contexts and highlight distinctive feature associated with each problem and connections among these recent developments. The topics covered include: confidence interval construction for a parameter in a weakly dependent stationary time series setting, change point detection in the mean, robust inference in regression models with weakly dependent errors, inference for nonparametric time series regression, inference for long memory time series, locally stationary time series and near-integrated time series, change point detection, and two-sample inference for functional time series, as well as the use of self-normalization for spatial data and spatial-temporal data. Some new variations of the self-normalized approach are also introduced with additional simulation results. We also provide a brief review of related inferential methods, such as blockwise empirical likelihood and subsampling, which were recently developed under the fixed-b asymptotic framework. We conclude the article with a summary of merits and limitations of self-normalization in the time series context and potential topics for future investigation.
- Conference Article
5
- 10.1145/3459637.3482167
- Oct 26, 2021
Change point detection is widely used for finding transitions between states of data generation within a time series. Methods for change point detection currently assume this transition is instantaneous and therefore focus on finding a single point of data to classify as a change point. However, this assumption is flawed because many time series actually display short periods of transitions between different states of data generation. Previous work has shown Bayesian Online Change Point Detection (BOCPD) to be the most effective method for change point detection on a wide range of different time series. This paper explores adapting the change point detection algorithms to detect abrupt changes over short periods of time. We design a segment-based mechanism to examine a window of data points within a time series, rather than a single data point, to determine if the window captures abrupt change. We test our segment-based Bayesian change detection algorithm on 36 different time series and compare it to the original BOCPD algorithm. Our results show that, for some of these 36 time series, the segment-based approach for detecting abrupt changes can much more accurately identify change points based on standard metrics.
- Research Article
14
- 10.1007/s10115-019-01366-x
- May 20, 2019
- Knowledge and Information Systems
A critical problem in time series analysis is change point detection, which identifies the times when the underlying distribution of a time series abruptly changes. However, several shortcomings limit the use of some existing techniques in real-world applications. First, several change point detection techniques are offline methods, where the whole time series needs to be stored before change point detection can be performed. These methods are not applicable to streaming time series. Second, most techniques assume that the time series is low-dimensional and hence have problems handling high-dimensional time series, where not all dimensions may cause the change. Finally, most methods require user-defined parameters that need to be chosen based on the observed data, which limits their applicability to new unseen data. To address these issues, we propose an Information Gain-based method that does not require prior distributional knowledge for detecting change points and handles high-dimensional time series. The advantages of our proposed method compared to the state-of-the-art algorithms are demonstrated from theoretical basis, as well as via experiments on four synthetic and three real-world human activity datasets.
- Supplementary Content
- 10.17635/lancaster/thesis/629
- Jan 1, 2018
- University of Lancaster
This thesis focuses upon the detection and prediction of changepoints in time series. In particular, we develop a range of methods, both parametric and non-parametric, to detect, predict, and forecast in the presence of changepoints. We consider a range of data applications. These include economic, environmental and telematics data sets. The first part of this thesis concentrates on forecasting. We propose two approaches to incorporate changepoints into the forecasting process. Each of these approaches are flexible. Additionally, we develop methodology to predict future changepoints in a time series. In particular, we can predict changepoints at both future time points, and changes near the end of the time series for which we do not yet have enough observations to detect. This also includes a new approach to pre-whitening time series that accounts for changes in the second order structure of the explanatory time series. The second part of this thesis is concerned with changepoint detection. We introduce methodology for detecting changes in both the variance and the autocovariance of time series. To do this we consider a local measure of the variance and the autocovariance over time. The approach is non-parametric and resilient to the presence of outliers.
- Research Article
- 10.1155/2022/6187110
- Mar 24, 2022
- Computational Intelligence and Neuroscience
Change-point detection (CPD) is to find abrupt changes in time-series data. Various computational algorithms have been developed for CPD applications. To compare the different CPD models, many performance metrics have been introduced to evaluate the algorithms. Each of the previous evaluation methods measures the different aspects of the methods. Based on the existing weighted error distance (WED) method on single change-point (CP) detection, a novel WED metrics (WEDM) was proposed to evaluate the overall performance of a CPD model across not only repetitive tests on single CP detection, but also successive tests on multiple change-point (MCP) detection on synthetic time series under the random slide window (RSW) and fixed slide window (FSW) frameworks. In the proposed WEDM method, a concept of normalized error distance was introduced that allows comparisons of the distance between the estimated change-point (eCP) position and the target change point (tCP) in the synthetic time series. In the successive MCPs detection, the proposed WEDM method first divides the original time-series sample into a series of data segments in terms of the assigned tCPs set and then calculates a normalized error distance (NED) value for each segment. Next, our WEDM presents the frequency and WED distribution of the resultant eCPs from all data segments in the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the mean WED (MWED) and MWTD (1-MWED) were obtained and then dealt with as important performance evaluation indexes. Based on the synthetic datasets in the Matlab platform, repetitive tests on single CP detection were executed by using different CPD models, including ternary search tree (TST), binary search tree (BST), Kolmogorov–Smirnov (KS) tests, t-tests (T), and singular spectrum analysis (SSA) algorithms. Meanwhile, successive tests on MCPs detection were implemented under the fixed slide window (FSW) and random slide window (RSW) frameworks. These CPD models mentioned above were evaluated in terms of our WED metrics, together with supplementary indexes for evaluating the convergence of different CPD models, including rates of hit, miss, error, and computing time, respectively. The experimental results showed the value of this WEDM method.
- Research Article
6
- 10.24294/jgc.v6i1.2010
- Jun 10, 2023
- Journal of Geography and Cartography
In most studies on hydroclimatic variability and trend, the notion of change point detection analysis of time series data has not been considered. Understanding the system is crucial for managing water resources sustainably in the future since it denotes a change in the status quo. If this happened, it is difficult to distinguish the time series data’s rising or falling tendencies in various areas when we look at the trend analysis alone. This study’s primary goal was to describe, quantify, and confirm the homogeneity and change point detection of hydroclimatic variables, including mean annual, seasonal, and monthly rainfall, air temperature, and streamflow. The method was employed using the four-homogeneity test, i.e., Pettitt’s test, Buishand’s test, standard normal homogeneity test, and von Neumann ratio test at 5% significance level. In order to choose the homogenous stations, the test outputs were divided into three categories: “useful”, “doubtful”, and “suspect”. The results showed that most of the stations for annual rainfall and air temperature were homogenous. It is found that 68.8% and 56.2% of the air temperature and rainfall stations respectively, were classified as useful. Whereas, the streamflow stations were classified 100% as useful. Overall, the change point detection analyses timings were found at monthly, seasonal, and annual time scales. In the rainfall time series, no annual change points were detected. In the air temperature time series except at Edagahamus station, all stations experienced an increasing change point while the streamflow time series experienced a decreasing change point except at Agulai and Genfel hydro stations. While alterations in streamflow time series without a noticeable change in rainfall time series recommend the change is caused by variables besides rainfall. Most probably the observed abrupt alterations in streamflow could result from alterations in catchment characteristics like the subbasin’s land use and cover. These research findings offered important details on the homogeneity and change point detection of the research area’s air temperature, rainfall, and streamflow necessary for the planers, decision-makers, hydrologists, and engineers for a better water allocation strategy, impact assessment and trend analyses.
- Conference Article
5
- 10.1109/fuzz-ieee.2015.7337988
- Aug 1, 2015
This paper provides evidence for the effectiveness of two extensions of Singular Spectrum Analysis, Complex SSA (CSSA) and Multivariate SSA (MSSA), when performing tasks such as smoothing, change point detection and forecasting of time series. CSSA is well suited for bivariate time series (usually displaying co-movements) and interval-valued time series. Functionally quasi-equivalent with CSSA in the bivariate case, MSSA comes, however, with its extra-potential for multivariate objects, such as fuzzy-valued time series (expressed in terms of α-levels). Our extension of the univariate SSA based change point detection algorithm to complex and multivariate cases is a novel approach. CSSA and MSSA are formally compared with each other and intensively tested in numerical experiments for smoothing, change point detection and forecasting with real-world data (a couple of foreign exchange rates with strong co-movements and a triangular-shaped fuzzy daily temperature time series).
- Research Article
3
- 10.1285/i20705948v11n1p208
- Apr 27, 2018
- Electronic Journal of Applied Statistical Analysis
Segmentation or change point detection is a very common topic in time series analysis, anomaly detection and pattern recognition. In our previous paper the time series generated by sensors with 3D accelerometers were analysed. It was noticed that such series consist of segments of independent and correlated observations. Hence the appropriate methods for change point detection for both data types must be implemented simultaneously.This paper provides an auxiliary comparison analysis which we intend to implement later for the above mentioned acceleration data.The available methods require usually a long execution time, so that it is time-consuming if several methods should be compared. In the framework of the present publication we want to give additional help for detecting a suitable change point detection method and for finding a good parameter setting. Our analysis is performed on simulated time series, that are normally distributed with constant but unknown mean and changes in variance.
- Research Article
261
- 10.1109/tkde.2006.1599387
- Apr 1, 2006
- IEEE Transactions on Knowledge and Data Engineering
We are concerned with the issue of detecting outliers and change points from time series. In the area of data mining, there have been increased interest in these issues since outlier detection is related to fraud detection, rare event discovery, etc., while change-point detection is related to event/trend change detection, activity monitoring, etc. Although, in most previous work, outlier detection and change point detection have not been related explicitly, this paper presents a unifying framework for dealing with both of them. In this framework, a probabilistic model of time series is incrementally learned using an online discounting learning algorithm, which can track a drifting data source adaptively by forgetting out-of-date statistics gradually. A score for any given data is calculated in terms of its deviation from the learned model, with a higher score indicating a high possibility of being an outlier. By taking an average of the scores over a window of a fixed length and sliding the window, we may obtain a new time series consisting of moving-averaged scores. Change point detection is then reduced to the issue of detecting outliers in that time series. We compare the performance of our framework with those of conventional methods to demonstrate its validity through simulation and experimental applications to incidents detection in network security.
- Research Article
6
- 10.3389/fphys.2023.1151312
- Apr 25, 2023
- Frontiers in Physiology
The development of compact and energy-efficient wearable sensors has led to an increase in the availability of biosignals. To effectively and efficiently analyze continuously recorded and multidimensional time series at scale, the ability to perform meaningful unsupervised data segmentation is an auspicious target. A common way to achieve this is to identify change-points within the time series as the segmentation basis. However, traditional change-point detection algorithms often come with drawbacks, limiting their real-world applicability. Notably, they generally rely on the complete time series to be available and thus cannot be used for real-time applications. Another common limitation is that they poorly (or cannot) handle the segmentation of multidimensional time series. Consequently, the main contribution of this work is to propose a novel unsupervised segmentation algorithm for multidimensional time series named Latent Space Unsupervised Semantic Segmentation (LS-USS), which was designed to easily work with both online and batch data. Latent Space Unsupervised Semantic Segmentation addresses the challenge of multivariate change-point detection by utilizing an autoencoder to learn a 1-dimensional latent space on which change-point detection is then performed. To address the challenge of real-time time series segmentation, this work introduces the Local Threshold Extraction Algorithm (LTEA) and a "batch collapse" algorithm. The "batch collapse" algorithm enables Latent Space Unsupervised Semantic Segmentation to process streaming data by dividing it into manageable batches, while Local Threshold Extraction Algorithm is employed to detect change-points in the time series whenever the computed metric by Latent Space Unsupervised Semantic Segmentation exceeds a predefined threshold. By using these algorithms in combination, our approach is able to accurately segment time series data in real-time, making it well-suited for applications where timely detection of changes is critical. When evaluating Latent Space Unsupervised Semantic Segmentation on a variety of real-world datasets the Latent Space Unsupervised Semantic Segmentation systematically achieves equal or better performance than other state-of-the-art change-point detection algorithms it is compared to in both offline and real-time settings.
- Research Article
10
- 10.1016/j.ins.2022.12.099
- Jan 5, 2023
- Information Sciences
A multiple long short-term model for product sales forecasting based on stage future vision with prior knowledge
- Research Article
24
- 10.1007/s11269-014-0798-5
- Sep 19, 2014
- Water Resources Management
Change point detection is an effective tool to identity whether the hydrological data are of consistency. In this paper, Pettitt test was first used to detect change point for annual rainfall and runoff time series in 6 selected sub-watersheds of Luanhe river basin in Northeast part of China. Then we presented a method to detect change point according to the law of mutual change of quality and quantity in variable fuzzy sets. We chose the mean of time series as assessment index as in other change point detection methods, and defined 95 and 5 % quantiles of the time series as the supremum and infimum respectively. We selected a reference period (for example, the first 10 points of the time series) as the stationary period, and after the reference period, we checked the mean value of the time series point by point. We used this method in the 6 sub-watersheds of Luanhe river basin. The results of the 2 methods showed that most annual rainfall time series had no change point, and some annual runoff time series had change point in 1979 or 1981. Comparison of the 2 methods was made, and it indicated that Pettitt test provided reference for variable fuzzy sets method, but the latter provided more reasonable results than Pettitt test in this study. This method can also be used in other natural time series.
- Conference Article
30
- 10.1109/icbk.2017.36
- Aug 1, 2017
Multivariate time series, which is a set of ordered observations for multiple variables, is pervasively generated in air condition, traffic, entertainment, etc. Echo State Network has shown promising performance for processing multivariate time series due to its ability to approximate sequential dynamics. However, the intrinsic relationships among time series have not been generally analyzed in the previous Echo State Network based methods. These relationships may help reveal the intrinsic characteristics of multivariate time series and benefit the classification performance. In this paper, we propose a novel method for approximating the sequential dynamics and learning the relationship among multiple variables explicitly in a unified framework. We learn a model for each multivariate time series and evaluate the distance of the original multivariate time series by the distance of their models. The relationship among variables in a multivariate time series is learnt according to Granger causality. We further constrain the sparsity of the learnt time series models to find the Focal series which help explain all the series. Experiments on benchmark datasets demonstrate superior classification performance of the proposed method.
- Research Article
31
- 10.1080/07474946.2011.563710
- Apr 1, 2011
- Sequential Analysis
This article deals with off-line detection of change points, for time series of independent observations, when the number of change points is unknown. We propose a sequential analysis method with linear time and memory complexity. Our method is based, on a filtered derivative method that detects the right change points as well as false ones. We improve the filtered derivative method by adding a second step in which we compute the p-values associated to every single potential change point. Then, we eliminate false alarms; that is, the change points that have p-values smaller than a given critical level. Next, we apply our method and penalized least squares criterion procedure to detect change points on simulated data sets and then we compare them. Eventually, we apply the filtered derivative with p-value method to the segmentation of heartbeat time series, and the detection of change points in the average daily volume of financial time series.