Abstract

In biological sequence analysis, long and frequently occurring patterns tend to be interesting. Data miners try to obtain frequent patterns with periodical wildcard gaps. However, with the existing definition set, the Apriori property does not hold; consequently, state–of–the–art algorithms are rather complex. This paper proposes an alternative definition of the number of offset sequences by adding a number of dummy characters. With the new definition, the Apriori property holds, hence our Apriori algorithm can mine all frequent patterns with minimal endeavour. This study also serves as the foundation of further research works on sequence pattern mining.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.