GraphAligner: rapid and versatile sequence-to-graph alignment

Mikko Rautiainen,Tobias Marschall

doi:10.1186/s13059-020-02157-2

Abstract

Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner

Highlights

Graphs provide a natural way of expressing variation or uncertainty in a genome [1, 2]
Contributions Here, we provide the first algorithm for banded sequence-to-graph alignment that scales to align noisy long reads to de Bruijn graphs of whole human genomes
We present a simple genotyping pipeline based on building a pangenome graph and aligning long reads to it

Summary

Introduction

Graphs provide a natural way of expressing variation or uncertainty in a genome [1, 2]. They have been used for diverse applications such as genome assembly [3,4,5], error correction [6,7,8], short tandem repeat genotyping [9], structural variation genotyping [10], and reference-free haplotype reconstruction [11]. Aligning sequences to graphs is a newer field and practical tools only start to emerge, where most of the existing tools are specialized for one purpose such as error correction [6,7,8], or hybrid genome assembly [4]. There is presently a lack of general-purpose tools for aligning long third-generation sequencing

Methods

Results

Discussion

Conclusion