Abstract

In recent years, time series classification with shapelets, due to the high accuracy and good interpretability, has attracted considerable interests. These approaches extract or learn shapelets from the training time series. Although they can achieve higher accuracy than other approaches, there still confront some challenges. First, they may suffer from low accuracy in the case of small training dataset. Second, they must manually set some parameters, like the number of shapelets and the length of each shapelet beforehand, and some hyper-parameters, like learning rate and regulation weight, which are difficult to set without prior knowledge. Third, extracting or learning shapelets incurs a huge computation cost, due to the huge search space. In this paper, we extend our previous shapelet learning approach ELIS to ELIS++. To improve the accuracy on the small training dataset, we propose a data augmentation approach. To learn the higher quality shapelets, based on the PAA shapelet candidates search technique proposed in ELIS, ELIS++ first propose a novel entropy-based approach shapelet candidate selection mechanism to discover shapelet candidates, and then applies the logistic regression model to adjust shapelets.To avoid setting other parameters manually, we propose a Bayesian Optimization based approach. Moreover, two techniques are proposed to improve the efficiency, coarse-grained shapelet adjustment and SIMD-based parallel computation. We conduct extensive experiments on 35 UCR datasets, and results verify the effectiveness and efficiency of ELIS++.

Highlights

  • Time series data are pervasive across almost all human endeavors, including medicine, finance and science

  • Since we only focus on the best match position of a shapelet, it is reasonable to rotate the suffix subsequence of any length to the front to generate new time series

  • We introduce a shapelet learning approach ELIS++

Read more

Summary

Introduction

Time series data are pervasive across almost all human endeavors, including medicine, finance and science. Many approaches of time series classification have been proposed, most of which can be categorized into two groups, distance-based approaches and structure-based approaches. The first category is distance-based approach, which uses the raw time series, or subsequences, to build the classification model. The second category is the structure-based approaches [19, 22, 23] They transform the raw numeric time series into approximate representations and build classification models based on them, like Symbolic Aggregate approximation (SAX) [14]. These approaches perform efficiently, but they suffer from inaccuracy due to the loss of detailed characteristics. Another limitation is that it is difficult to find the most appropriate approximation granularity

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call