Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG.

Manuel Cáceres,Massimo Cairo,Brendan Mumey,Kristoffer Sahlin,Edin Husić,Alexandru I Tomescu,Romeo Rizzi

doi:10.1109/tcbb.2021.3131203

Abstract

A multi-assembly problem asks to reconstruct multiple genomic sequences from mixed reads sequenced from all of them. Standard formulations of such problems model a solution as a path cover in a directed acyclic graph, namely a set of paths that together cover all vertices of the graph. Since multi-assembly problems admit multiple solutions in practice, we consider an approach commonly used in standard genome assembly: output only partial solutions (contigs, or safe paths), that appear in all path cover solutions. We study constrained path covers, a restriction on the path cover solution that incorporate practical constraints arising in multi-assembly problems. We give efficient algorithms finding all maximal safe paths for constrained path covers. We compute the safe paths of splicing graphs constructed from transcript annotations of different species. Our algorithms run in less than 15 seconds per species and report RNA contigs that are over 99% precise and are up to 8 times longer than unitigs. Moreover, RNA contigs cover over 70% of the transcripts and their coding sequences in most cases. With their increased length to unitigs, high precision, and fast construction time, maximal safe paths can provide a better base set of sequences for transcript assembly programs.

Highlights

M ANY real-world problems require to reconstruct an unknown object from partial data observed from it
Genome assembly is a typical instance of such problem in Bioinformatics: given a set of high-throughput sequencing reads obtained from some genomic sequence, we need to reconstruct the sequence from which the reads originate
We considered a natural generalization of the classical problem of minimum path cover, including more practical constraints, which we called constrained path covers

Summary

Introduction

M ANY real-world problems require to reconstruct an unknown object from partial data observed from it. Genome assembly is a typical instance of such problem in Bioinformatics: given a set of high-throughput sequencing reads obtained from some genomic sequence, we need to reconstruct the sequence from which the reads originate. A major issue in such problems is that multiple solutions (reconstructions) can explain the observed data, making it difficult to distinguish the correct solution. As such, reporting one arbitrary solution may lead to an incorrect answer to the problem. An established way of coping with this issue is to report only partial solutions about which we are “confident” that they are correct. Stateof-the-art genome assemblers do not output entire chromosomes, but only contigs, namely genomic fragments that are promised to occur in the original genome

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics	Publication Date: Nov 1, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics

Lead the way for us

Similar Papers

Automatic generation of path covers based on the control flow analysis of computer programs
A Bertolino ... M Marre
IEEE Transactions on Software Engineering | VOL. 20
A Bertolino, et. al.A Bertolino ... M Marre
01 Jan 1993
IEEE Transactions on Software Engineering | VOL. 20

Nontrivial path covers of graphs: existence, minimization and maximization
Renzo Gómez ... Yoshiko Wakabayashi
Journal of Combinatorial Optimization | VOL. 39
Renzo Gómez, et. al.Renzo Gómez ... Yoshiko Wakabayashi
30 Nov 2019
Journal of Combinatorial Optimization | VOL. 39

Covering a Graph with Nontrivial Vertex-Disjoint Paths: Existence and Optimization
Renzo Gómez ... Yoshiko Wakabayashi
-
Renzo Gómez, et. al.Renzo Gómez ... Yoshiko Wakabayashi
01 Jan 2018
01 Jan 2018

Optimal Hamiltonian completions and path covers for trees, and a reduction to maximum flow
D S Franzblau ... A Raychaudhuri
The ANZIAM Journal | VOL. 44
D S Franzblau, et. al.D S Franzblau ... A Raychaudhuri
01 Oct 2002
The ANZIAM Journal | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics