Using the minimum description length to discover the intrinsic cardinality and dimensionality of time series

Bing Hu,Thanawin Rakthanmanon,Stefano Lonardi,Scott Evans,Yuan Hao,Eamonn Keogh

doi:10.1007/s10618-014-0345-2

Abstract

Many algorithms for data mining or indexing time series data do not operate directly on the raw data, but instead they use alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this work, we investigate the problem of discovering the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to automatically discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length framework. Extensive empirical tests show that our method is simpler, more general and more accurate than previous methods, and has the important advantage of being essentially parameter-free.

Highlights

Most algorithms for indexing or mining time series data operate on higher-level representations of the data, which include transforms, quantization, approximations and multi-resolution approaches
Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), Adaptive Piecewise Constant Approximation (APCA) and Piecewise Linear Approximation (PLA) are models that all have their advocates for various data mining tasks and each has been used extensively (Ding et al 2008)
The question of choosing the best abstraction level and/or representation of the data for a given task/dataset still remains. We investigate this problem by discovering the natural intrinsic model, dimensionality and cardinality of a time series

Summary

Introduction

Most algorithms for indexing or mining time series data operate on higher-level representations of the data, which include transforms, quantization, approximations and multi-resolution approaches. Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), Adaptive Piecewise Constant Approximation (APCA) and Piecewise Linear Approximation (PLA) are models that all have their advocates for various data mining tasks and each has been used extensively (Ding et al 2008). The question of choosing the best abstraction level and/or representation of the data for a given task/dataset still remains. MDL is the cornerstone of many bioinformatics algorithms (Evans et al 2007; Rissanen 1989), and has had some impact in data mining, it is arguably underutilized in time series data mining (Jonyer et al 2004; Papadimitriou et al 2005)

Objectives

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Mining and Knowledge Discovery	Publication Date: Feb 15, 2014
Citations: 26	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Using the minimum description length to discover the intrinsic cardinality and dimensionality of time series

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Similar Papers

Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL
Bing Hu ... Scott Evans
-
Bing Hu, et. al.Bing Hu ... Scott Evans
01 Dec 2011
01 Dec 2011

FCAD: Feature-based Clipped Representation for Time Series Anomaly Detection
Peng Zhan ... Haoran Xu
-
Peng Zhan, et. al.Peng Zhan ... Haoran Xu
27 Sep 2020
27 Sep 2020

An Effective Implementation of Motif-Based Time Series Classification
Nguyen Van Kier ... Duong Tuan Anh
-
Nguyen Van Kier, et. al.Nguyen Van Kier ... Duong Tuan Anh
01 Mar 2019
01 Mar 2019

ISAX: disk-aware mining and indexing of massive time series datasets
Jin Shieh ... Eamonn Keogh
Data Mining and Knowledge Discovery | VOL. 19
Jin Shieh, et. al.Jin Shieh ... Eamonn Keogh
27 Feb 2009
Data Mining and Knowledge Discovery | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using the minimum description length to discover the intrinsic cardinality and dimensionality of time series

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery