Computation of Rank and Select Functions on Hierarchical Binary String and Its Application to Genome Mapping Problems for Short-Read DNA Sequences

Kouichi Kimura,Sumio Sugano,Asako Koike,Yutaka Suzuki

doi:10.1089/cmb.2008.0146

Abstract

Abstract We have developed efficient in-practice algorithms for computing rank and select functions on a binary string, based on a novel data structure, a hierarchical binary string with hierarchical accumulatives. It efficiently stores decomposed information on partial summations over various scales of subregions of a given binary string, so that the required space overhead ratio is only about 3.5% irrespective of the string length. Values of rank and select functions are computed hierarchically in [(log(2)n)/8] iterations, where n is the string length. For example, for an unbiased random binary string of 64 G bits, each value of these functions can be computed in about a microsecond, on average, on a single 3.0-GHz CPU using 8+ GB of memory. We also present their applications to genome mapping problems for large-scale short-read DNA sequence data, especially produced by ultra-high-throughput new-generation DNA sequencers. The algorithms are applied to the binarization of the Burrows-Wheeler transform of the human genome DNA sequence. For the sake of high-speed performance, we adopted a somewhat stringent mapping condition that allows at most a single-base mismatch (either a substitution, insertion, or deletion of a single base) per query sequence. An experimentally implemented program mapped several thousands of sequences per second on a single 3.0-GHz CPU, several times faster than ELAND, a widely used mapping program with the Illumina-Solexa 1G analyser.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computation of Rank and Select Functions on Hierarchical Binary String and Its Application to Genome Mapping Problems for Short-Read DNA Sequences

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: Nov 1, 2009
Citations: 10

Similar Papers

Counting Runs of Ones with Overlapping Parts in Binary Strings Ordered Linearly and Circularly
Frosso S Makri ... Anastasios N Arapis
International journal of statistics and probability | VOL. 2
Frosso S Makri, et. al.Frosso S Makri ... Anastasios N Arapis
09 Jul 2013
International journal of statistics and probability | VOL. 2

An Experimentally Derived Data Set Constructed for Testing Large-Scale DNA Sequence Assembly Algorithms
Donald Seto ... Leroy Hood
Genomics | VOL. 15
Donald Seto, et. al.Donald Seto ... Leroy Hood
01 Mar 1993
Genomics | VOL. 15

Hardware accelerator architecture for simultaneous short-read DNA sequences alignment with enhanced traceback phase
Nuno Sebastião ... Paulo Flores
Microprocessors | VOL. 36
Nuno Sebastião, et. al.Nuno Sebastião ... Paulo Flores
30 May 2011
Microprocessors | VOL. 36

Two-pattern strings II—frequency of occurrence and substring complexity
Frantisek Franek ... W.F Smyth
Journal of Discrete Algorithms | VOL. 5
Frantisek Franek, et. al.Frantisek Franek ... W.F Smyth
13 Jul 2007
Journal of Discrete Algorithms | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computation of Rank and Select Functions on Hierarchical Binary String and Its Application to Genome Mapping Problems for Short-Read DNA Sequences

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology