Analyses of whole-genome data often reveal that some genes have evolutionary histories that diverge from the majority phylogeny estimated for the entire genome. We present a probabilistic model that deals with heterogeneity among gene trees, implement it via the Gibbs sampler, and apply it to the plastid genome. Plastids and their genomes are transmitted as a single block without recombination, hence homogeneity among gene trees within this genome is expected. Nevertheless, previous work has revealed clear heterogeneity among plastid genes (e.g., Delwiche and Palmer 1996). Other studies, using whole plastid genomes of various algae and land plants, found little additional heterogeneity (Martin et al. 1998; Adachi et al. 2000). We augment the earlier studies by using a data set of 14 taxa: 6 land plants, 2 green algae, a diatom, 2 red algae and a cryptophyte, the cyanelle of the glaucocystophyte Cyanophora, and the blue-green alga Synechocystis as an outgroup. Contrary to the earlier analyses, we cannot find even a single, dominant consensus tree. Therefore, we formulate a probabilistic model that divides the genes into two sets: those that follow the consensus tree and those that have independent gene trees. No particular tree is supported by more than three-fourths of the genes. But the set of genes that follows a certain tree is fairly independent of data processing and the method of analysis. With one possible exception, we find no evidence for collinear or functionally related genes to follow similar trees. The phylogenetic pattern also seems independent of bias in amino acid composition. Among possible explanations for the observed phenomenon, the hypothesis that different genes have different covarion structures is difficult to assess. But gene duplication may be possible through the inverted or direct repeat regions, while horizontal gene transfer seems less likely. In contrast to green algae and land plants, inverted repeat regions in red algae and in Cyanophora show abundant differences among the copies. Thus, genes may get duplicated when they are recruited into the inverted repeat region and one of the two copies may be lost after leaving the inverted repeat region.
Read full abstract