Abstract

The concept of homology lies at the root of evolutionary biology. Since the seminal work of Fitch (1970), three main categories of homology relationships have been defined at the molecular level: orthology, paralogy, and xenology. In brief, if two gene copies arose by duplication they are paralogs, whereas if they arose through speciation they are orthologs. If one of them was transferred from a contemporaneous species, we call them xenologs (Supplementary Fig. S1 in Supplementary Material online, available at http://dx.doi.org/10.5061/dryad.87k57; see Gray and Fitch (1983); Fitch (2000)). Indeed, these terms were coined under a phylogenetic framework in which species were represented by single individuals, and as such they have remained very much intact during the last four decades—although particular cases within these categories have received specific names (Mindell and Meyer 2001). However, advances in sequencing technology have changed the field, and it is now very common to collect data sets containing multiple gene loci and/or multiple individuals per species. In general, such genome-wide data sets not only have unveiled extensive phylogenomic incongruence (Jeffroy et al. 2006; Salichos and Rokas 2013) but have brought back to the spotlight the consideration of how ancestral polymorphisms sort within populations (Edwards 2009). Altogether, phylogenomic data make imperative the explicit distinction between organismal and gene histories. Let us consider phylogenetic relationships at three different levels: species, loci, and gene copies (Fig. 1). The distinction between species/population trees and gene trees has been known for decades (Goodman et al. 1979; Pamilo and Nei 1988; Takahata 1989), whereas the introduction of locus trees into these models is very recent (Rasmussen and Kellis 2012). In brief, a species tree depicts the evolutionary history of the sampled organisms. In this case, the nodes represent speciation events, connected by branches that reflect the population history along these periods, and where their widths represent effective population size (Ne) and their lengths represent time (usually in years or number of generations). Apart from speciations, only evolutionary processes that affect species as a whole are represented at this level, like hybridization. Note that species trees are equivalent to population trees when the organismal units of interest are conspecific populations. In this case, the nodes of the population trees represent isolation events. In general, we will refer to “species” as any diverging, interbreeding group of individuals regardless of its taxonomic rank. On the other hand, a locus tree represents the evolutionary history of the sampled loci for a given gene family (see Rasmussen and Kellis 2012). Since the loci exist inside individuals evolving as part of a population, the locus tree is embedded within the species tree. In a locus tree, the nodes depict either genetic divergence due to speciation in the embedding species tree or locus-level events such as duplication, losses, or horizontal gene transfers, whereas the branch lengths and widths represent time and Ne, respectively. Here, we assume that the locuslevel events get immediately fixed in the population, so these Ne are equivalent to those in the species tree and are the same for every locus. Finally, a gene tree represents the evolutionary history of the sampled gene copies that evolve inside the locus tree. Gene tree nodes indicate coalescent events, which looking forward in time correspond to the process of DNA replication and divergence, and that can occur around the speciation time, well before (deep coalescence) or afterwards (migration in population trees). The branches of the gene tree usually represent amount of substitutions per site, and can also represent number of generations or other measures of time. Importantly, these three historical layers do not necessarily coincide. True species/population trees can differ from true locus trees due to gene duplications, losses, and/or horizontal gene transfers, whereas true gene trees can differ from their embedding locus and species trees if there is incomplete lineage sorting (ILS) (Maddison 1997; Page and Charleston 1997) (and migration in the case of population trees). In this regard,

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call