Sequence Alignment on Directed Graphs.

Vaddadi Naga Sai Kavya,Naveen Sivadasan,Kshitij Tayal,Rajgopal Srinivasan

doi:10.1089/cmb.2017.0264

Vaddadi Naga Sai Kavya, Naveen Sivadasan + Show 2 more

Open Access

https://doi.org/10.1089/cmb.2017.0264

Copy DOI

Abstract

Genomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAGs). To achieve this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Furthermore, such graph extensions could have considerable blowup in their size and in the worst case the blow-up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming (DP) formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. With the proposed refinements, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph, and its feedback vertex set. We conducted experiments to compare the proposed algorithm against the existing POA-based techniques. We also performed alignment experiments on the genome variation graphs constructed from the 1000 Genomes data. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high similarity to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.

Highlights

Most state-of-the-art high throughput genome studies rely heavily on high quality reference genome [1]
V-ALIGN is based on a novel dynamic programming formulation that allows gapped alignment with affine, linear or constant gaps directly on the input graph
When the alignment is restricted to a filtered set of subgraphs, which is done for improved efficiency, the V-ALIGN can be used for aligning to these candidate subgraphs

Summary

Introduction

Most state-of-the-art high throughput genome studies rely heavily on high quality reference genome [1]. Various graph data structures have been studied in the literature for pangenome representation with subtle distinctions [3] These include De Bruijn graphs [7], [8], ABruijn graphs [9], Enredo graphs [10], Cactus graphs [5], [11], Population Reference graphs [6], String graphs [12], and Variation graphs [2]. In variation graphs [2], the common subsequences are encoded as labeled vertices and variations are represented using additional vertices and directed edges. Such representations have shown promise in improved read mapping and variant calling performance [4]. Graph based reference has necessitated the development of graph based computational pipelines for genome analyses [3], [2], [4]

Objectives

Methods

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: Sep 8, 2018
Citations: 18	License type: cc-by

R Discovery Prime

R Discovery Prime

Sequence Alignment on Directed Graphs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Similar Papers

On Structural Parameterizations of the Edge Disjoint Paths Problem
Robert Ganian ... M S Ramanujan
Algorithmica | VOL. 83
Robert Ganian, et. al.Robert Ganian ... M S Ramanujan
25 Jan 2021
Algorithmica | VOL. 83

A parameterized algorithm for subset feedback vertex set in tournaments
Tian Bai ... Mingyu Xiao
Theoretical Computer Science | VOL. 975
Tian Bai, et. al.Tian Bai ... Mingyu Xiao
23 Aug 2023
Theoretical Computer Science | VOL. 975

Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems.
Catherine Grasso ... Christopher Lee
Bioinformatics | VOL. 20
Catherine Grasso, et. al.Catherine Grasso ... Christopher Lee
12 Feb 2004
Bioinformatics | VOL. 20

The Maximum Binary Tree Problem.
...
-
, et. al. ...
01 Aug 2020
01 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequence Alignment on Directed Graphs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology