Abstract

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.

Highlights

  • Reconstruction of gene trees represents a commonly encountered problem in evolutionary studies, such as inferring the evolutionary history of a gene [1,2], finding the origin of a gene, discovering the function of a gene [3,4], and estimating species trees from gene trees [5,6,7,8,9]

  • We applied the AST method to resolve the following evolutionary questions: (i) can we infer the evolutionary history of the small ribosomal subunit (SSU) protein S5, 16 S ribosomal RNA (16 S rRNA) and glycosyltransferase gene family 8 (GT8), and can we identify ancient horizontal gene transfer (HGT) from bacteria to eukaryotes

  • We assessed the performance of AST on both simulated and real biological data, and compared the results of AST with those by SS, random sampling (RS) methods and manual selection (MS)

Read more

Summary

Introduction

Reconstruction of gene trees represents a commonly encountered problem in evolutionary studies, such as inferring the evolutionary history of a gene (or a gene family) [1,2], finding the origin of a gene, discovering the function of a gene [3,4], and estimating species trees from gene trees [5,6,7,8,9]. Reconstructing the phylogenetic history of a gene (or gene family) generally involves three steps: 1) selection of homologous sequences (DNA, RNA, or protein sequences); 2) multiple sequence alignment (MSA); and 3) phylogenetic tree reconstruction. Taxonomic sampling of species trees refers to sampling of taxa based on some genetic markers of taxa or whole genomes, rather than sequences of genes or proteins [22,23]. An additional reason for sequence sampling is to facilitate the detection of horizontal gene transfer (HGT) based on phylogenetic tree comparisons. To obtain gene trees with high taxonomic diversity, we developed an algorithm named as AST to automatically select representative homologous sequences over taxa. We applied the AST method to resolve the following evolutionary questions: (i) can we infer the evolutionary history of the small ribosomal subunit (SSU) protein S5 (rpS5), 16 S ribosomal RNA (16 S rRNA) and glycosyltransferase gene family 8 (GT8), and can we identify ancient HGTs from bacteria to eukaryotes

Materials and Methods
Results
Comparative analyses of tree construction on simulated data
Inference of the evolutionary history of a gene or gene family
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call