Abstract

An important goal in microbial computational genomics is to identify crucial events in the evolution of a gene that severely alter the duplication, loss, and mobilization patterns of the gene within the genomes in which it disseminates. In this article, we formalize this microbiological goal as a new pattern-matching problem in the domain of gene tree and species tree reconciliation, denoted "Reconciliation-Scenario Altering Mutation (RSAM) Discovery." We propose an [Formula: see text] time algorithm to solve this new problem, wheremandnare the number of vertices of the input gene tree and species tree, respectively, andkis a user-specified parameter that bounds from above the number of optimal solutions of interest. The algorithm first constructs a hypergraph representing thekhighest scoring reconciliation scenarios between the given gene tree and species tree, and then interrogates this hypergraph for subtrees matching a prespecified RSAM pattern. Our algorithm is optimal in the sense that the number of hypernodes in the hypergraph can be lower bounded by [Formula: see text]. We implement the new algorithm as a tool, called RSAM-finder, and demonstrate its application to the identification of RSAMs in toxins and drug resistance elements across a data set spanning hundreds of species.

Highlights

  • Prokaryotes can be found in the most diverse and severe ecological niches of the planet

  • We say that the mutation has a causal association with the observed dissemination pattern of the mutated gene

  • Our hypergraph-ensemble approach is based on a model proposed by [24] for network evolution, where here we extend and adapt it to the DLT model

Read more

Summary

Introduction

Prokaryotes can be found in the most diverse and severe ecological niches of the planet. Our hypergraph-ensemble approach is based on a model proposed by [24] for network evolution, where here we extend and adapt it to the DLT model This hypergraph of k-best reconciliations, intended to provide some robustness to the noise typical of this data, will serve as the search-space for the pattern-matching stage. We adapt the approach proposed by Bansal et al [3] for the basic, one-best variant of DLT reconciliation, extending it to an efficient k-best variant This yields an O(m · n · k) time algorithm for the problem, where m and n are the number of vertices of the input Gene tree and Species tree, respectively, and k is a user-specified parameter that bounds from above the number of optimal solutions of interest. Our tool RSAM-finder provides the users with a query language able to express more robust patterns, according to the various applications where the pattern-search is to be employed

Preliminaries
Hypergraph of k-Best Scenarios
Stage 1
Stage 2
Stage 3
Applications
Methods and Data
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call