Abstract

In recent years, the increasing availability of genomic resources has provided an opportunity to develop phylogenetic markers for phylogenomics. Efficient methods to search for candidate markers from the huge number of genes within genomic data are particularly needed in the era of phylogenomics. Here, rather than using the traditional approach of comparing genomes of two distantly related taxa to develop conserved primers, we take advantage of the multiple genome alignment resources from the the University of California-San Cruz Genome Browser and present a simple and straightforward bioinformatic approach to automatically screen for candidate nuclear protein-coding locus (NPCL) markers. We tested our protocol in tetrapods and successfully obtained 21 new NPCL markers with high success rates of polymerase chain reaction amplification (mostly over 80%) in 16 diverse tetrapod taxa. These 21 newly developed markers together with two reference genes (RAG1 and mitochondrial 12S-16S) are used to infer the higher level relationships of tetrapods, with emphasis on the debated position of turtles. Both maximum likelihood (ML) and Bayesian analyses on the concatenated data combining the 23 markers (21,137 bp) yield the same tree, with ML bootstrap values over 95% and Bayesian posterior probability equaling 1.0 for most nodes. Species tree estimation using the program BEST without data concatenation produces similar results. In all analyses, turtles are robustly recovered as the sister group of Archosauria (birds and crocodilians). The jackknife analysis on the concatenated data showed that the minimum sequence length needed to robustly resolve the position of turtles is 13-14 kb. Based on the large 23-gene data set and the well-resolved tree, we also estimated evolutionary timescales for tetrapods with the popular Bayesian method MultiDivTime. Most of the estimated ages among tetrapods are similar to the average estimates of the previous dating studies summarized by the book The Timetree of Life.

Highlights

  • In recent years, molecular markers, primarily DNA and derived protein sequences, have become a fundamental means to reconstruct many parts of the ‘‘Tree of Life.’’ phylogenetic inference based on a single gene or a few genes is rarely robust and often leads to conflicting results (Rokas et al 2003)

  • New nuclear protein–coding locus (NPCL) Markers The main multiple genome alignments (MGAs) file used in this study was downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/xenTro1/multiz5way/

  • 21 alignments were discarded due to apparent incongruence between their neighbor joining (NJ) trees and the expected species tree (zebrafish, (frog, (chicken,))). This step should not bias the results for the placement of turtles because the five-species tree is well established and unrelated to turtles

Read more

Summary

Introduction

Molecular markers, primarily DNA and derived protein sequences, have become a fundamental means to reconstruct many parts of the ‘‘Tree of Life.’’ phylogenetic inference based on a single gene or a few genes is rarely robust and often leads to conflicting results (Rokas et al 2003). This is partly because small data sets contain fewer characters and often suffer from stochastic errors related to the length of the data. Every gene sampled may bring systematic errors to a tree, but the occurrence of these errors is randomly distributed in the whole tree; stochastic error naturally diminishes when more and more genes are considered, the overall answer is still likely to be reliable

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call