MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure

Weina Li,Jiadong Ren,Xiangtao Li

doi:10.1371/journal.pone.0195601

Abstract

A significant approach for the discovery of biological regulatory rules of genes, protein and their inheritance relationships is the extraction of meaningful patterns from biological sequence data. The existing algorithms of sequence pattern discovery, like MSPM and FBSB, suffice their low efficiency and accuracy. In order to deal with this issue, this paper presents a new algorithm for biological sequence pattern mining abbreviated MpBsmi based on the data index structure. The MpBsmi algorithm employs a sequence position table abbreviated ST and a sequence database index structure named DB-Index for data storing, mining and pattern expansion. The ST and DB-Index of single items are firstly obtained through scanning sequence database once. Then a new algorithm for fast support counting is developed to mine the table ST to identify the frequent single items. Based on a connection strategy, the frequent patterns are expanded and the expanded table ST is updated by scanning the DB-Index. The fast support counting algorithm is used for obtaining the frequent expansion patterns. Finally, a new pruning technique is developed for extended pattern to avoid the generation of unnecessarily large number of candidate patterns. The experiments results on multiple classical protein sequences from the Pfam database validate the performance of the proposed algorithm including the accuracy, stability and scalability. It is showed that the proposed algorithm has achieved the better space efficiency, stability and scalability comparing with MSPM, FBSB which are the two main algorithms for biological sequence mining.

Highlights

Biological sequence is an important component of bioinformatics data, generally including three categories: DNA sequence, RNA sequence and protein sequence [1]
In order to deal with this issue, this paper presents a new algorithm for biological sequence pattern mining abbreviated MpBsmi based on the data index structure
It is showed that the proposed algorithm has achieved the better space efficiency, stability and scalability comparing with MSPM, FBSB which are the two main algorithms for biological sequence mining

Summary

Introduction

Biological sequence is an important component of bioinformatics data, generally including three categories: DNA sequence, RNA sequence and protein sequence [1]. (1) scan the sequence database once to construct the position table 1-ST and database index 1-DB-Index of single items; the fast support counting algorithm is used to get frequent sequence 1-BSP. The biological sequence pattern mining algorithm Mpbsmi contains building position table. In the case of a fixed support threshold of 40% with the same as Experiment 2, the result of the biological sequence patterns obtained are shown in Table 7: the first column shows the support threshold, the (k+1)th column and the kth column are the data set size and the corresponding number of pattern is mined, wherein 1

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Apr 23, 2018
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree
Fei Xie ... Xiaoke Zhang
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL | VOL. 14
Fei Xie, et. al.Fei Xie ... Xiaoke Zhang
05 Aug 2019
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL | VOL. 14

Efficiently Detecting Frequent Patterns in Biological Sequences
Wei Liu ... Ling Chen
-
Wei Liu, et. al.Wei Liu ... Ling Chen
01 Oct 2011
01 Oct 2011

생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝
Tae-Ho Kang ... Jae-Soo Yoo
The KIPS Transactions:PartD | VOL. 15D
Tae-Ho Kang, et. al.Tae-Ho Kang ... Jae-Soo Yoo
30 Apr 2008
The KIPS Transactions:PartD | VOL. 15D

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences
Tae-Ho Kang ... Jae-Soo Yoo
International Journal of Contents | VOL. 3
Tae-Ho Kang, et. al.Tae-Ho Kang ... Jae-Soo Yoo
30 Jun 2007
International Journal of Contents | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE