Abstract

MotivationBased on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis.ResultsThis paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing.AvailabilityAll source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html.

Highlights

  • Next-generation genome sequencing (NGS) technologies, including Illumina/Solexa and AB/SOLiD, generate billions of short reads (25-200 bp) and become more and more popular

  • This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools

  • It takes advantages of the base quality scores to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment

Read more

Summary

Introduction

Next-generation genome sequencing (NGS) technologies, including Illumina/Solexa and AB/SOLiD, generate billions of short reads (25-200 bp) and become more and more popular. Many short read alignment algorithms have been developed to address these challenges, different in speed, memory, accuracy, and alignment strategy [10,11]. The other one is Burrow-Wheeler Transform [14], and the representative alignment algorithms are BWA [15], Bowtie2 [16], and SOAP2 [17]. These alignment algorithms are more and more efficient and accurate, there are a portion of reads which are not mapped at all by the alignment tool or the mapping quality scores are less than the threshold

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.