Mining approximate sequential patterns with gaps

Kelly K Yip,David A Nembhard

doi:10.1504/ijdmmm.2015.069249

Abstract

Time series data are found in diverse fields including, science, business, medicine and engineering. In this paper, we consider sequential pattern mining for categorical time series data that contain multiple independent time-series. Frequent patterns are considered important in a variety of applications. However, it is common for data to contain noise, and/or for the source process to have considerable variability. Conventional sequential pattern mining methods that use exact matching address, some but not all of these difficulties. Two general approaches used in previous studies to mine sequential patterns in data with noise are distance-based clustering and hidden Markov models. While these approaches are useful in mining frequent sequential patterns in noisy data, we further propose a framework (MWASP: multiple-width approximate sequential pattern mining) that uncovers frequent approximate sequential patterns with various widths. A mined pattern in this framework is representative of a group of sequences that follow the pattern's event flow order. This gives insight into the occurrence of the pattern longitudinally, as well as across the population. The pattern can be recognised as a common pattern across the multiple time series, time, or both.

Full Text