Abstract

Time series data are an extremely important type of data in the real world. Time series data gradually accumulate over time. Due to the dynamic growth in time series data, they tend to have higher dimensions and large data scales. When performing cluster analysis on this type of data, there are shortcomings in using traditional feature extraction methods for processing. To improve the clustering performance on time series data, this study uses a recurrent neural network (RNN) to train the input data. First, an RNN called the long short-term memory (LSTM) network is used to extract the features of time series data. Second, pooling technology is used to reduce the dimensionality of the output features in the last layer of the LSTM network. Due to the long time series, the hidden layer in the LSTM network cannot remember the information at all times. As a result, it is difficult to obtain a compressed representation of the global information in the last layer. Therefore, it is necessary to combine the information from the previous hidden unit to supplement all of the data. By stacking all the hidden unit information and performing a pooling operation, a dimensionality reduction effect of the hidden unit information is achieved. In this way, the memory loss caused by an excessively long sequence is compensated. Finally, considering that many time series data are unbalanced data, the unbalanced K-means (UK-means) algorithm is used to cluster the features after dimensionality reduction. The experiments were conducted on multiple publicly available time series datasets. The experimental results show that LSTM-based feature extraction combined with the dimensionality reduction processing of the pooling technology and cluster processing for imbalanced data used in this study has a good effect on the processing of time series data.

Highlights

  • Time series data are a common type of data in work and life. e time series dataset is a collection of observations at different moments collected with a certain collection technology and at certain time intervals. erefore, each observation result in time series data is often time stamped

  • Because time series data have many special properties, commonly used clustering algorithms cannot achieve satisfactory results when clustering time series data. e purpose of this research is to find suitable models for various time series data. e research on time series data generally focuses on the chronological nature of time series data

  • This study uses an recurrent neural network (RNN) that can process the data in chronological order to train the data

Read more

Summary

Introduction

Time series data are a common type of data in work and life. e time series dataset is a collection of observations at different moments collected with a certain collection technology and at certain time intervals. erefore, each observation result in time series data is often time stamped. Time series data are a common type of data in work and life. With the continuous improvement of computer technology and storage capabilities, storage devices store many time series data. Time series data refer to data composed of sequence values or events that change over time. Ere are two main types of time series data in reality: continuous and numerical. E subtype values mainly describe the relationship between a time series and other business activities or other derivative data, for example, whether the appearance or function of a certain product plays a key role in the sales of the entire enterprise. E time series data analysis is mainly for prediction, classification, and anomaly detection. One is based on traditional analysis techniques. e second is based on deep learning technology

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call