Looking forwards or looking backwards in avian phylogeography? A comment on Zink and Barrowclough 2008

STAFFAN BENSCH,SCOTT EDWARDS

doi:10.1111/j.1365-294x.2009.04270.x

Abstract

In a recent review, Zink & Barrowclough (2008, hereafter ZB08) addressed the question of whether mitochondrial DNA (mtDNA) was sufficient to describe the phylogeographic history of avian species. They offered a number of conclusions, most of which were variants of the central tenet that mtDNA is sufficient to describe the broad picture of avian phylogeographies. Specifically, they concluded that, for questions of phylogeographic pattern, mtDNA does quite well on its own as a descriptor of geographical and taxonomic patterns, and that ‘the case for the primacy of nuclear variation for studies of phylogeography is not so clear to us’. The motivation for their review was a concern about the conclusions of several recent reviews (e.g. Edwards et al. 2005) summarizing several well-known limitations of mtDNA, including evidence for pervasive natural selection, the use of a single maternally inherited molecule to describe population histories and uncertainty over whether the mitochondrial gene trees obtained in hundreds of phylogeographic studies are representative of the histories of the relevant species or populations. One of the main arguments on which their confidence in mtDNA is based is their claim that nuclear genes are ‘lagging indicators’ of population structure compared with mtDNA: the majority of cases in birds (and presumably other taxa) reveal that mtDNA, a ‘leading indicator’ of population structure, exhibits reciprocal monophyly or high differentiation between taxa when nuclear DNA (nuDNA) does not. This enhanced resolving power, they suggest, is reason enough to be content with mitochondrial gene trees in delimiting patterns in species and phylogeographic history. In contrast, they concede that questions of phylogeographic process—demographic histories, gene flow, effective population sizes, population expansions, presumably speciation itself—‘will require multiple loci to provide robust estimates’. Although we agree with several points made by ZB08, particularly with regard to the enhanced utility of sequence-based markers vs. microsatellites (Brito & Edwards 2009), we outline in this comment a number of reasons why their argument for being content with mtDNA in describing patterns of avian phylogeographic history is flawed. First, we suggest that the distinction between phylogeographic pattern and process is a straw man and ultimately impossible to implement in a practical sense. Here, we point out that very few, if any, avian phylogeographic studies based solely on mtDNA—including many in which ZB08 are authors (e.g. Pavlova et al. 2005; Zink et al. 2008), and including their re-analysis of Bensch et al.’s (2006) data in their own review—have resisted the urge to comment on avian phylogeographic process. Next, we point out that even when the focus of phylogeographic studies is on pattern, statistical common sense suggests that when the desire is to make statements about populations or taxa—entities at a higher level of organization than the gene—one must sample multiple loci, even though we acknowledge that the field has for many years been comfortable doing so with a single locus. Drawing on recently published examples of multilocus studies in humans, we show that by describing nuclear genes as lagging indicators, ZB08 ignore the primary power of these markers—their multiplicity. In doing so, they fail to acknowledge that combining information from multiple nuclear genes can provide much finer descriptions of phylogeographic pattern than can the single locus provided by mtDNA. In summary, we suggest that the view of phylogeography espoused by ZB08 is a backward-looking one, one that ignores the ease with which nuclear sequence data can be collected even now—notwithstanding the imminent flood of nuDNA sequence data to be unleashed by modern sequencing technologies, especially for non-model species. Zink & Barrowclough (2008) make a clear distinction early in their review between phylogeographic pattern vs. process, and suggest that mtDNA is usually sufficient for interrogating the former. However, we suggest that because pattern is a product of process, even the simplest of phylogeographic patterns described in the literature have elicited some statement of process, making their preference for separating the two unworkable. For example, a phylogeographic pattern found in some studies of avian mtDNA is paraphyletic or polyphyletic mtDNA lineages—instances in which the sampled taxa do not exhibit reciprocal monophyly for mtDNA (e.g. Ödeen & Bjorklund 2003). [In their table 2, ZB08 list the wagtails (Motacilla) in Ödeen & Bjorklund (2003) as exhibiting reciprocal monophyly, when in fact flava is paraphyletic relative to the very distinct species citreola and cinerea, and the mtDNA and nuDNA patterns for citreola are different from one another]. Virtually any interpretation of this pattern requires some statement as to process. For example, is the lack of monophyly due to recent isolation or homogenizing gene flow after secondary contact? Are we witnessing an ongoing, spatially restricted hybridization; a temporally restricted pulse of hybridization or the type of ongoing and temporally uniform gene flow usually estimated in stepping-stone models? On its own, the observed pattern is very limited in its utility, for example, in understanding biogeography or species limits or for informing conservation. To ZB08, a well-resolved reciprocally monophyletic mitochondrial gene tree implies a history of isolation without gene flow—the most common process associated with this pattern. But is genetic distinctiveness useful on its own without any reference to process? For example, should we be comfortable delimiting species based on monophyly or lack thereof of mtDNA, without speculating as to how the pattern arose? As a case in point, in the greenish warbler, mtDNA exhibits two very distinct clades breaking the ring both north and south of the Himalayan range, but nuclear markers demonstrate smooth isolation by distance (Irwin et al. 2005), suggesting that the sharp mtDNA breaks are the result of stochastic processes. In addition to the fact that reciprocal monophyly can sometimes be driven by forces other than geographical isolation (e.g. natural selection or stochastic events), we suggest that the utility of pattern without process in cases of mtDNA monophyly is low and that even the most rudimentary interpretations of mtDNA pattern require statements of process, something that ZB08 suggest requires nuclear genes. We regard as unrealistic ZB08’s suggestion that the uses of mtDNA can be addressed by restricting inferences solely to pattern. In most early phylogeographic studies in birds, including those published by us (Edwards 1993; Bensch & Hasselquist 1999), mtDNA when used alone is used to describe not only phylogeographic pattern, but also process. We challenge ZB08 to identify a single mtDNA-only study that has refrained from discussing phylogeographic process, including the many recent phylogeographic studies by Zink and colleagues on birds in Europe, Russia and Beringia, all of which make statements of process, for example, when they try to estimate population growth using mismatch distributions. In their analysis of sequence data in Bensch et al. (2006), ZB08 disregard their own recommendation and discuss process from an analysis of mtDNA variation in Phylloscopus. They write ‘we subjected the [mitochondrial] data to the R2 analysis for population growth … and found it highly significant’. That they are comfortable with a conclusion of population growth despite it being based solely on mtDNA shows that even they are unwilling to follow their own recommendations. ZB08 suggested that recent reviews of mtDNA constituted an ‘indictment’ of its use in phylogeography. We suggest that, because of phylogeographers’ natural interest in and high priority on process, ZB08’s conviction that mtDNA addresses pattern more than process is even more of an indictment of previous avian phylogeographic studies than is our suggestion that mtDNA is limited in its power to describe population history on its own (Edwards et al. 2005). The preceding discussion raises a related point: even phylogeographers who claim to focus solely on pattern do so in the hope of making some statement about populations or species. Few would argue with the claim that the relevant focus in phylogeography, species delimitation, even phylogeny, is the population or species—not the gene or (often small) set of genes that are used to make inference. Species by their very nature consist of many genes, each of which records a signal of the demographic history. But the pattern observed in each gene is the result of demographic factors acting on the whole genome, as well as random fluctuations due to genetic drift and the frequently complicating patterns generated by nonhierarchical processes such as introgression. An often-overlooked aspect of mtDNA-only phylogeographic studies is that technically we are unwarranted in making statements about anything other than the mtDNA; when we do, our assumption is that other markers will follow the mtDNA pattern. Early researchers in the field, such as Avise and others, knew this and routinely proffered the caveat that mtDNA does not record, for example, the history of males in the case of species with maternal inheritance of mtDNA. As another example of this point, when one achieves a high bootstrap support on a mitochondrial gene tree, most researchers are tempted to claim that the population harbouring that gene tree is also monophyletic, yet such an inference would not be warranted for both inferential and statistical reasons. The contrast between inferences of genes and populations when using mtDNA is also illustrated by the availability of multilocus methods now for estimating species trees, as opposed to gene trees (Edwards 2008; Liu et al. 2008); analysis of data with species tree methods suggests that the power of claims about population and lineage monophyly rests not on a single highly resolved gene tree but on the genealogical patterns at many loci and the accumulated signal among them, even in the face of extensive incomplete lineage sorting. In addition, there is always the possibility that the gene and species trees will differ; striking examples of mtDNA clades inferring groups wrongly come from the many examples of displaced clines for mtDNA and morphology or nuclear genes (see Ruegg 2008, for a recent example in birds). Directly extrapolating from the gene to the species was forgivable in the early days of phylogeography, but we suggest that, given the relative ease of assaying variation at nuclear markers now, making inferences from a single gene to statements about populations or species is unwarranted today, and that ZB08 have become unduly complacent with such inferences. Zink & Barrowclough (2008) spend much of their review cataloguing the diversity of patterns of differentiation of nuclear and mitochondrial genes in avian phylogeographic studies, and come to the conclusion that ‘Many studies found that the two sets of markers (nuclear and mitochondrial) were either concordant or the mitochondrial estimate of a statistic exceeded the equivalent nuclear one’. They use this finding to suggest that ‘In cases in which a structured mtDNA gene tree is accompanied by unstructured nuclear trees, the former ought to be taken as indicative of lineage divergence; mtDNA is the leading indicator, nuDNA the trailing one’. However, this contrast of mtDNA vs. nuclear genes is misleading because it focuses only on gene-by-gene contrasts. In doing so, it fails to acknowledge (i) that the most useful measures of population differentiation rely specifically on the accumulated signals from many genes and (ii) that mtDNA is only a single gene in a population genetic sense. What one supposedly loses in the signal in any one nuclear gene relative to mtDNA one gains back by many orders of magnitude by adding additional nuclear genes to this signal. Many popular phylogeographic methods, especially those that focus appropriately on estimating parameters of populations and species rather than genes, are known to deliver more accurate and reliable estimates of population parameters with multiple than with single loci (Edwards & Beerli 2000; Hey & Nielsen 2004;Maddison & Knowles 2006). Even when such methods are focused solely on pattern, as in population delimitation methods such as Structure (Pritchard et al. 2000) or principal components analysis (PCA), multiple genes invariably provide a more sensitive signal to population boundaries and numbers than does mtDNA alone. The increased resolving power of multiple nuclear genes vs. the single gene provided by mtDNA is well illustrated by recent phylogeographic studies in humans. For example, Novembre et al. (2008) analysed large numbers of single nucleotide polymorphism (SNP) markers in European populations to study the aspects of phylogeographic pattern. PCA and other methods revealed highly diagnosable groups on a spatial scale that were hitherto unachievable by mtDNA or any other limited set of markers. Previous studies of Europeans show that none of the populations discernable in Novembre et al. (2008) exhibits mtDNA monophyly; indeed, in humans, mtDNA (and Y-chromosome) monophyly is known only for the major branches in the human phylogeographic gene tree. Thus, the accumulated signal of multiple nuclear genes reveals details of phylogeographic pattern that far exceed what is provided by mtDNA alone (Rosenberg 2009). This is particularly the case when the among-population variation of any one nuclear marker is slight (FST among European geographical regions in their study was only 0.004); indeed formulas for estimating the number of markers required to differentiate populations with a given FST are known (Patterson et al. 2006). We acknowledge that most studies in birds or other non-model groups have yet to accumulate as many markers as are available in humans, but the number of accessible markers for birds and other groups is rapidly increasing (Backström et al. 2008) and will no doubt further increase as sequencing technology evolves. On the other hand, humans are among the most unstructured of vertebrate species, and it is likely that the population structure of most nonhuman animals will be resolvable with far fewer markers than are required by human phylogeography. The number of nuclear markers employed in avian phylogeographic studies has been steadily climbing and now rivals the 30–40 that were routinely characterized in allozyme studies of birds in the 1980s (Lee & Edwards 2008). Furthermore, the ability of nuclear markers to delimit populations at a higher level of resolution by mtDNA alone has been observed in some studies of birds as well as in other non-model species (e.g. Wang et al. 2003). An obvious issue that arises is how contingent the resolved groups are on the number and nature of nuclear markers used; although we acknowledge that this is an unresolved issue of importance to phylogeography, the point that nuclear genes together have more resolving power than does mtDNA still stands. Many studies in humans and other groups now use multivariate, multilocus statistical methods other than gene trees to delimit and describe phylogeographic pattern. For example, when there are no multiple hits, the frequency spectra of SNPs for single populations or the joint spectra for multiple populations are sufficient statistics for data from single or multiple populations, respectively (i.e. they contain all the information present in the original data). ZB08 presumably failed to acknowledge the higher resolving power of nuclear genes because of their focus on gene trees, which as they and others have shown will often exhibit monophyly for mtDNA but not for nuDNA. Although gene trees are a useful way of describing phylogeographic diversity, they are not the only tools available and indeed many recent studies are moving away from gene trees as the main tool for describing phylogeographic pattern. There are many ways to diagnose species and populations other than via gene trees, and, for example, the accumulated signal of allele frequency shifts across many nuclear loci—a signal that mtDNA alone cannot possess—is now becoming the most powerful and sensitive means for delimiting diversity and pattern within species. To suggest that nonphylogenetic methods such as PCA or STRUCTURE have no place in phylogeography is to unnecessarily restrict the purview and methods that enrich the field. ZB08 point to other challenges of nuclear genes, such as the need for phasing unresolved haplotypes with multiple heterozygous loci. Although we acknowledge that phasing is indeed an uncertainty and that errors could lead to misestimation of nuclear gene trees (Hare 2001), the effect of such errors on other types of phylogeographic analysis is not known, and may be minimal. Regardless, because new sequencing technologies such as 454 determine the sequences of single DNA molecules, rather than collections of molecules as in dideoxy sequencing of PCR products, they will eliminate the phasing problem, which will soon be a thing of the past (Brito & Edwards 2009). In summary, we find the review by ZB08 to be backward looking, when phylogeography, like any science, should be forward looking. ZB08 seem comfortable with the resolving power of mtDNA, but many empirical studies of phylogeographic pattern show that multiple nuclear genes have substantially more resolving power than does mtDNA alone. ZB08 favour the particular level of resolution provided by mtDNA alone; but the particular level of resolution provided by mtDNA is ultimately arbitrary, and one could just as easily favour the resolving power of a nuclear gene depending on one’s motives, especially given that subspecies boundaries in birds tend not to conform to mtDNA clades (Zink 2004). We believe that the large number of mtDNA-only studies in birds and other groups is a strong first step for these groups. Inevitably, however, if the field is healthy, phylogeography and its methods and tools will change through time. We suggest that the favouritism towards mtDNA displayed in ZB08 encourages becoming comfortable with a tool that has proven extremely useful, but is ultimately limited, and that can now be supplemented relatively easily by the enormous signal available in the many loci of the nuclear genome.

Full Text