Abstract

Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes.

Highlights

  • Functional associations between genes are often inferred from the similar genomic context

  • Pathway genes may inherit unevenly among the species, and the detection of taxonomic groups for pathway gene co-inheritance may provide new insights into improving network inference based on inheritance profiles

  • Inheritance profiles of the query species genomes on reference species were represented as vectors in the principal component analysis (PCA) biplots, which represent a pair of principal components of the phylogenetic profiles

Read more

Summary

Introduction

Functional associations between genes are often inferred from the similar genomic context. Phylogenetic profiling, which predicts the functional association between two genes via the correlation of their phylogenetic distributions, has been more thoroughly investigated than other types of genomic context-based network inference methods [1] because it capitalizes on the complex evolutionary co-inheritance pattern of pathway genes during speciation [2]. If two genes have similar phylogenetic profiles across reference species, they seem to have been co-inherited to carry out their joint function. Accounting for ‘profile complexity’ (i.e., the complexity of the inheritance patterns) can improve network inference: the more complex the phylogenetic profiles (i.e., a more complex inheritance pattern), the more likely that the inferred co-functional relationship exists [5]. The incorporation of phylogenetic relationships among reference species has been shown to improve network inference [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call