Abstract

As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.

Highlights

  • The investigation of evolutionary histories of characters in terms of structure and function lies at the heart of biological research [1]

  • We describe a parsimonious method based on the ancestral inference of gene content, using metabolic pathways and entire pangenomes, collections of genomic sequences across taxonomic groups

  • By extending the notion of a phylogenetic profile to a composite group represented by the corresponding enzyme profiles of a pathway against a target selection of genomes, and typically organized into a pangenome, PathTrace is able to detect unambiguously the presence or absence of a pathway across a phylogeny

Read more

Summary

Introduction

The investigation of evolutionary histories of characters in terms of structure and function lies at the heart of biological research [1]. Earlier approaches are mainly based on gene content or protein-s­ equence reconstructions [5,6,7,8]. Phylogenetic profiles and their various incarnations coupled with sophisticated simulation experiments have been extensively used, due to. Collective effort, a recurring theme has always been the identification and further validation of lateral gene transfers [13,14,15], which present significant challenges to the underlying evolutionary models [16, 17]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call