Abstract

BackgroundNext-generation DNA sequencing technologies generate tens of millions of sequencing reads in one run. These technologies are now widely used in biology research such as in genome-wide identification of polymorphisms, transcription factor binding sites, methylation states, and transcript expression profiles. Mapping the sequencing reads to reference genomes efficiently and effectively is one of the most critical analysis tasks. Although several tools have been developed, their performance suffers when both multiple substitutions and insertions/deletions (indels) occur together.ResultsWe report a new algorithm, Basic Oligonucleotide Alignment Tool (BOAT) that can accurately and efficiently map sequencing reads back to the reference genome. BOAT can handle several substitutions and indels simultaneously, a useful feature for identifying SNPs and other genomic structural variations in functional genomic studies. For better handling of low-quality reads, BOAT supports a "3'-end Trimming Mode" to build local optimized alignment for sequencing reads, further improving sensitivity. BOAT calculates an E-value for each hit as a quality assessment and provides customizable post-mapping filters for further mapping quality control.ConclusionEvaluations on both real and simulation datasets suggest that BOAT is capable of mapping large volumes of short reads to reference sequences with better sensitivity and lower memory requirement than other currently existing algorithms. The source code and pre-compiled binary packages of BOAT are publicly available for download at http://boat.cbi.pku.edu.cn under GNU Public License (GPL). BOAT can be a useful new tool for functional genomics studies.

Highlights

  • Next-generation DNA sequencing technologies generate tens of millions of sequencing reads in one run

  • Background generation sequencing technologies have been widely used in biology research, such as in genomewide identification of polymorphisms, transcription factor binding sites, methylation states, and transcript expression profiles [1]

  • Basic Oligonucleotide Alignment Tool (BOAT) does not require that all reads have the same length

Read more

Summary

Introduction

Next-generation DNA sequencing technologies generate tens of millions of sequencing reads in one run These technologies are widely used in biology research such as in genome-wide identification of polymorphisms, transcription factor binding sites, methylation states, and transcript expression profiles. One of the most critical analysis tasks is to map the sequencing reads to reference sequences accurately and efficiently General alignment tools such as BLAST [3] and BLAT [4] suffer from long running time. New dedicated algorithms such as ELAND (unpublished), SOAP [5], MAQ [6], RMAP [7] and SeqMap [8] have been developed to achieve better mapping efficiency. While these algorithms are effective in handling nearperfect matches, their mapping sensitivity, speed, and/or memory requirement suffer when handling simultaneous multiple substitutions and indels

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.