Abstract

The representation task in time series data mining has been a critical issue because the direct manipulation of continuous, high-dimensional data is extremely difficult to complete efficiently. One time series representation approach is a symbolic representation called the Symbolic Aggregate Approximation (SAX). The main function of SAX is to find the appropriate numbers of alphabet symbols and word size that represent the time series. The aim is to achieve the largest alphabet size and maximum word length with the minimum error rate. The purpose of this study is to propose an integrated approach for a symbolic time series data representation that attempts to improve SAX by improving alphabet and word size. The Relative Frequency (RF) binning method is employed to obtain alphabet size and is integrated with the proposed Multipitch Harmony Search (HS) algorithm to calculate the optimum alphabet and word size. RF is used because of its ability to obtain a sufficient number of intervals with a low error rate compared to other related techniques. HS algorithm is an optimization algorithm that randomly generates solutions for alphabet and word sizes and selects the best solutions. HS algorithms are compatible with multi-pitch adjustment. The integration of the RF and HS algorithms is developed to maximize information rather than to improve the error rate. The algorithms are tested on 20 standard time series datasets and are compared with the meta-heuristic algorithms GENEBLA and the original SAX algorithm. The experimental results show that the proposed method generates larger alphabet and word sizes and achieves a lower error rate than the compared methods. With larger alphabet and word sizes, the proposed method is capable of preserving important information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call