Abstract

BackgroundRecent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing.ResultsWe describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective.ConclusionsThe results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.

Highlights

  • Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS)

  • Basic Local Alignment via Successive Refinement (BLASR), which maps reads using coarse alignment methods developed during whole genome alignment (WGA) studies, while speeding up these methods by using the advanced data structures employed in many next generation sequencing (NGS) mapping studies

  • We present a practical comparison of alignment methods on PacBioRS sequences

Read more

Summary

Introduction

Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. Reads produced by Sanger sequencing that are highly accurate and nearly 1000 bases long are successfully mapped using hash-based methods such as MEGABLAST [2], cross match (Green P., www.phrap.org, unpublished), and BLAT [3]. These methods are too inefficient to map read sets from generation sequencing (NGS) instruments by Illumina (San Diego, CA, USA)

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call