Abstract
BackgroundMost studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.ResultsA gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.ConclusionDespite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
Highlights
Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families
Though not universal, phylogeneticists' avoidance of the nuclear genome of plants is in no small part due to its relative complexity – mainly the frequent occurrence of paralogous copies of genes derived from gene duplications [4]
BMC Evolutionary Biology 2007, 7(Suppl 1):S3 including for example the prospect that Arabidopsis has undergone three complete genome doublings since the origin of seed plants, legumes two, and cereals two or more [5,6]. This contributes to already complex dynamics of gene family expansion and contraction driven by functional divergence [4]
Summary
Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. BMC Evolutionary Biology 2007, 7(Suppl 1):S3 including for example the prospect that Arabidopsis has undergone three complete genome doublings since the origin of seed plants, legumes two, and cereals two or more [5,6] This contributes to already complex dynamics of gene family expansion and contraction driven by functional divergence [4]. In Arabidopsis, 65% of genes are members of gene families [7], and because of silencing of alternative paralogs in other taxa, in addition to sporadic background rates of gene duplication, phylogenetic studies will undoubtedly sample even more duplications as they increase in taxonomic scope
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have