Abstract

Motivation: Mapping high-throughput sequencing data to a reference genome is an essential step for most analysis pipelines aiming at the computational analysis of genome and transcriptome sequencing data. Breaking ties between equally well mapping locations poses a severe problem not only during the alignment phase but also has significant impact on the results of downstream analyses. We present the multi-mapper resolution (MMR) tool that infers optimal mapping locations from the coverage density of other mapped reads. Results: Filtering alignments with MMR can significantly improve the performance of downstream analyses like transcript quantitation and differential testing. We illustrate that the accuracy (Spearman correlation) of transcript quantification increases by 15% when using reads of length 51. In addition, MMR decreases the alignment file sizes by more than 50%, and this leads to a reduced running time of the quantification tool. Our efficient implementation of the MMR algorithm is easily applicable as a post-processing step to existing alignment files in BAM format. Its complexity scales linearly with the number of alignments and requires no further inputs. Availability and implementation: Open source code and documentation are available for download at http://github.com/ratschlab/mmr. Comprehensive testing results and further information can be found at http://bioweb.me/mmr. Contact: andre.kahles@ratschlab.org or gunnar.ratsch@ratschlab.org Supplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Addressing the increasing need for fast and accurate mapping of high throughput sequencing data to a reference sequence, many different software tools have been developed over the past years, many of which are frequently updated and improved [10, 6, 3, 7]

  • We present a simple, yet powerful tool, called the Multi-Mapper Resolution tool (MMR), that assigns each read to a unique mapping location in a way that the overall read coverage across the genome is as uniform as possible

  • We show that this strategy has a positive influence on downstream analyses, such as transcript quantification and prediction

Read more

Summary

Introduction

Addressing the increasing need for fast and accurate mapping of high throughput sequencing data to a reference sequence, many different software tools have been developed over the past years, many of which are frequently updated and improved [10, 6, 3, 7]. For the remaining, still significantly large, fraction of reads (≈10–20%, depending on alignment sensitivity), several possible mapping locations exist. We present a simple, yet powerful tool, called the Multi-Mapper Resolution tool (MMR), that assigns each read to a unique mapping location in a way that the overall read coverage across the genome is as uniform as possible. MMR makes use of the critical fraction of unambiguously aligned reads and iteratively selects the alignments of ambiguously mapping reads in a way the overall coverage becomes more uniform. We show that this strategy has a positive influence on downstream analyses, such as transcript quantification and prediction

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.