Abstract

Summary: Many high-throughput sequencing experiments produce paired DNA reads. Paired-end DNA reads provide extra positional information that is useful in reliable mapping of short reads to a reference genome, as well as in downstream analyses of structural variations. Given the importance of paired-end alignments, it is surprising that there have been no previous publications focusing on this topic. In this article, we present a new probabilistic framework to predict the alignment of paired-end reads to a reference genome. Using both simulated and real data, we compare the performance of our method with six other read-mapping tools that provide a paired-end option. We show that our method provides a good combination of accuracy, error rate and computation time, especially in more challenging and practical cases, such as when the reference genome is incomplete or unavailable for the sample, or when there are large variations between the reference genome and the source of the reads. An open-source implementation of our method is available as part of Last, a multi-purpose alignment program freely available at http://last.cbrc.jp.Contact: martin@cbrc.jpSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Many high-throughput sequencers provide a paired-end option, in which each of the two opposite strands of a DNA fragment is read from the edge to the interior in the 50–30 direction, generating a pair of reads

  • Paired-end reads can be obtained by a simple modification to the standard single-end workflow; yet, they provide several benefits over single-end reads. They contain extra positional information that aids in accurate mapping of reads to a reference, for instance, by disambiguating alignments when one of the ends aligns to a repetitive region

  • We focus on the former: the task of mapping a set of paired-end reads to a reference genome, which is often the first and fundamental step in inferring biological phenomena from high-throughput sequencing data

Read more

Summary

Introduction

Many high-throughput sequencers provide a paired-end option, in which each of the two opposite strands of a DNA fragment is read from the edge to the interior in the 50–30 direction, generating a pair of reads. Paired-end reads can be obtained by a simple modification to the standard single-end workflow; yet, they provide several benefits over single-end reads. They contain extra positional information that aids in accurate mapping of reads to a reference, for instance, by disambiguating alignments when one of the ends aligns to a repetitive region. They are extremely useful in downstream analyses of structural variations, such as detection of indels or rearrangements. For most the aligners, the use of pairing information significantly improves mapping accuracy

Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.