In the existing research on time-series event prediction (TSEP) methods, most of the work is focused on improving the algorithm for classifying subsequence sets (sets composed of multiple adjacent subsequences). However, these prediction methods ignore the timing dependence between the subsequence sets, nor do they capture the mutual transition relationship between events, the prediction effect on a small sample data set is very poor. Meanwhile, the sequence labeling problem is one of the common problems in natural language processing and image segmentation. To solve this problem, this paper proposed a new framework for time-series event prediction, which transforms the event prediction problem into a labeling problem, to better capture the timing relationship between the subsequence sets. Specifically, the framework used a sequence clustering algorithm for the first time to identify representative patterns in the time series, then represented the set of subsequences as a weighted combination of patterns, and used the eXtreme gradient boosting algorithm (XGBoost) for feature selection. After that, the selected pattern feature was used as the input of the long-term short-term memory model (LSTM) to obtain the preliminary prediction value. Furthermore, the fully-linked conditional random field (CRF) was used to smooth and refine the preliminary prediction value to obtain the final prediction result. Finally, the experimental results of event prediction on five real data sets show that the CX-LC method has a certain improvement in prediction accuracy compared with the other six models.
Read full abstract