Abstract

Mining biological data can provide insight into various realms of biology, such as finding co-occurring biosequences, which is essential for biological analyses and data mining. Sequential pattern mining reveals all-length implicit motifs, which have specific structures and are of functional significance in biological sequences. Traditional sequential pattern mining algorithms are inefficient for small alphabets and long sequences, such as DNA and protein sequences; therefore, it is necessary to move away from these algorithms. An approach called the Depth-First Spelling algorithm for mining sequential patterns (motifs) with Gap constraints in biological sequences (referred to as DFSG) is proposed in this work. In biological sequences, DFSG runtime is substantially shorter than that of GenPrefixSpan, where GenPrefixSpan is a method based on PrefixSpan (PrefixSpan is one of the fastest algorithms in traditional sequential pattern mining algorithms).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call