Approximate string matching in DNA sequences

Lok-Lam Cheng Lok-Lam Cheng,Siu-Ming Yiu Siu-Ming Yiu,D.W Cheung

doi:10.1109/dasfaa.2003.1192395

Abstract

Approximate string matching on large DNA sequences data is very important in bioinformatics. Some studies have shown that suffix tree is an efficient data structure for approximate string matching. It performs better than suffix array if the data structure can be stored entirely in the memory. However our study find that suffix array is much better than suffix tree for indexing the DNA sequences since the data structure has to be created and stored on the disk due to its size. We propose a novel auxiliary data structure which greatly improves the efficiency of suffix array in the approximate string matching problem in the external memory model. The second problem we have tackled is the parallel approximate matching in DNA sequence. We propose 2 novel parallel algorithms for this problem and implement them on a PC cluster The result shows that when the error allowed is small, a direct partitioning of the array over the machines in the cluster is a more efficient approach. On the other hand, when the error allowed is large, partitioning the data over the machines is a better approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Approximate string matching in DNA sequences

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

LibFLASM: a software library for fixed-length approximate string matching.
Lorraine A K Ayad ... Ahmad Retha
BMC Bioinformatics | VOL. 17
Lorraine A K Ayad, et. al.Lorraine A K Ayad ... Ahmad Retha
10 Nov 2016
BMC Bioinformatics | VOL. 17

A Preprocessing for Approximate String Matching
Kensuke Baba ... Yasuhiro Yamada
-
Kensuke Baba, et. al.Kensuke Baba ... Yasuhiro Yamada
01 Jan 2010
01 Jan 2010

Approximate string matching for high-throughput sequencing

-

01 Jan 2015
01 Jan 2015

Implementation of a programmable array processor architecture for approximate string matching algorithms on FPGAs
P.D Michailidis ... K.G Margaritis
-
P.D Michailidis, et. al.P.D Michailidis ... K.G Margaritis
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Approximate string matching in DNA sequences

Abstract

Talk to us

Similar Papers