Abstract

Distinguishing subsequence patterns mining aims to discover the differences between different categories of sequence databases and to express characteristics of classes. It plays an important role in biomedicine, feature information selection, time-series classification, and other areas. The existing distinguishing subsequence patterns mining only focuses on whether a pattern appears in a sequence, regardless of the number of occurrences of the pattern in the sequence and the proportion of the pattern in the entire sequence database, which affects the discovery of the distinguishing patterns when there are a large number of irrelevant occurrences. Therefore, the nonoverlapping conditional distinguishing subsequence patterns mining algorithm is proposed. In this paper, we focus on the number of nonoverlapping occurrences that effectively reduce the number of irrelevant or redundant occurrences, and in this way, the number of occurrences can be better grasped. At the same time, we use a specially designed data structure, namely, a Nettree, to avoid backtracking. In addition, we use the distinguishing patterns as classification features, and carry out classification experiments on DNA sequences and time-series data with two classes. Extensive experimental results and comparisons demonstrate the efficiency of the proposed algorithm and the correctness of the feature extraction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.