Abstract

A new symbolic representation of time series, called ABBA, is introduced. It is based on an adaptive polygonal chain approximation of the time series into a sequence of tuples, followed by a mean-based clustering to obtain the symbolic representation. We show that the reconstruction error of this representation can be modelled as a random walk with pinned start and end points, a so-called Brownian bridge. This insight allows us to make ABBA essentially parameter-free, except for the approximation tolerance which must be chosen. Extensive comparisons with the SAX and 1d-SAX representations are included in the form of performance profiles, showing that ABBA is often able to better preserve the essential shape information of time series compared to other approaches, in particular when time warping measures are used. Advantages and applications of ABBA are discussed, including its in-built differencing property and use for anomaly detection, and Python implementations provided.

Highlights

  • Symbolic representations of time series are an active area of research, being useful for many data mining tasks including dimension reduction, motif and rule discovery, prediction, and clustering of time series

  • Aside from verifying that adaptive Brownian bridge-based aggregation (ABBA) can represent time series to higher accuracy than Symbolic Aggregate approXimation (SAX) and 1d-SAX using a comparable number of symbols k and string length n, we find that SAX outperforms 1d-SAX when the same number of symbols k is used for both

  • We introduced ABBA, an adaptive symbolic time series representation which aims to preserve the essential shape of a time series

Read more

Summary

Introduction

Symbolic representations of time series are an active area of research, being useful for many data mining tasks including dimension reduction, motif and rule discovery, prediction, and clustering of time series. Symbolic time series representations allow for the use of algorithms from text processing and bioinformatics, which often take. This series is sampled at equidistant time points with values t0, t1, . Despite the large number of dimension-reducing time series representations in the literature, very few are symbolic. Most techniques are numeric in the sense that they reduce a time series to a lower-dimensional vector with its components taken from a continuous range; see Bettaiah and Ranganath (2014), Fu (2011), Lin et al (2007) for reviews. The construction of symbolic time series representations typically consists of two parts. The second part, the discretization process, assigns a symbol to each segment

Objectives
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.