Abstract

Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner

Highlights

  • Graphs provide a natural way of expressing variation or uncertainty in a genome [1, 2]

  • Contributions Here, we provide the first algorithm for banded sequence-to-graph alignment that scales to align noisy long reads to de Bruijn graphs of whole human genomes

  • We present a simple genotyping pipeline based on building a pangenome graph and aligning long reads to it

Read more

Summary

Introduction

Graphs provide a natural way of expressing variation or uncertainty in a genome [1, 2]. They have been used for diverse applications such as genome assembly [3,4,5], error correction [6,7,8], short tandem repeat genotyping [9], structural variation genotyping [10], and reference-free haplotype reconstruction [11]. Aligning sequences to graphs is a newer field and practical tools only start to emerge, where most of the existing tools are specialized for one purpose such as error correction [6,7,8], or hybrid genome assembly [4]. There is presently a lack of general-purpose tools for aligning long third-generation sequencing

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call