Abstract

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.

Highlights

  • A time series refers to a series of values of the same statistical index arranged in order of their occurrence time

  • Agrawal and Srikant [1] first proposed the concept of sequential pattern mining (SPM) and proposed the aprioriall, apriorisome and Generalized Sequential Pattern (GSP) algorithms

  • Wang [15] discovered that weather is the main factor causing turnout failures and established a failure prediction method based on a Bayesian network

Read more

Summary

Introduction

A time series (or dynamic series) refers to a series of values of the same statistical index arranged in order of their occurrence time. We proposed a text similarity measure method based on the word embedding distance model. On the basis of text similarity measurement, we established a sequential pattern mining algorithm with an event window. The algorithm proposed in this paper has wide applicability It can be used in text mining in many fields to discover sequential patterns. The other chapters of this paper are arranged below: Section 2 is literature review, including the development and application of the SPM algorithm, text similarity measurement method and text data-based SPM algorithm.

Sequential Pattern Mining Algorithm
Text Mining and Similarity Measurement
Sequential Pattern Mining for Text Data
Text Similarity Measurement Based on Word Embedding Distance Model
Text Data Preprocessing
Measurement of Text Similarity Based on the Bag-of-Words Model
Measurement of Text Similarity Based on the Topic Model
Model Building
Model Checking
Sequential Pattern Mining
Similar Events Sets Mining
Calculation of Support Degree Based on an Event Window and SES
Concepts and Definitions A
Experimental Results
Robustness Evaluation of the Algorithm
Discussion
Application in Business Activities and Decision
Conlusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.