Abstract

rapid development of bioinformatics has resulted in the explosion of DNA sequence data which is characterized by large number of items. Studies have shown that biological functions are dictated by contagious portions of the DNA sequence. Finding contiguous frequent patterns from long data sequences such as DNA sequences is a particularly challenging task and can pave the way towards new breakthroughs. Apriori based techniques were among the first to be used in frequent contagious pattern mining. Later improved approaches like GSP, Prefix Span were also applied but the approaches required either large number of sequence scans, generated large number of candidates or required higher number of intermediate sequential patterns. In this paper an improvement of the positional based approach for contagious frequent pattern mining is DNA sequences is proposed. The proposed algorithm improves the existing positional based approach by introducing a new amalgamated sorting and joining technique which helps to reduce time and space complexity. The proposed approach outperforms traditional existing contiguous frequent mining approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call