Congruence versus phylogenetic accuracy: revisiting the incongruence length difference test.

Andrew L Hipp,Kenneth J Sytsma,Jocelyn C Hall,François Lutzoni

doi:10.1080/10635150490264752

Andrew L Hipp, Kenneth J Sytsma + Show 2 more

Open Access

https://doi.org/10.1080/10635150490264752

Copy DOI

Abstract

Phylogenies inferred from independent data partitions usually differ from one another in topology despite the fact that they are drawn from the same set of organisms (Rodrigo et al., 1993). Some topological differences are due to sampling error or to the use of inappropriate phylogenetic models. These types of topological incongruence do not have their origin in genealogical discordance, i.e., differences between phylogenies underlying the respective data partitions (Baum et al., 1998). Incongruence that is not due to genealogical discordance can often be addressed by modifying the model used in phylogenetic reconstruction (Cunningham, 1997b), and combining data is an appropriate way of dealing with random topological differences that are attributable to sampling error. However, other topological differences, e.g., those arising from lineage sorting (Maddison, 1997; Avise, 2000) and hybridization (Dumolin-Lapegue et al., 1997; Rieseberg, 1997; McKinnon et al., 1999; Avise, 2000), reflect genealogical discordance between the data partitions. Most systematists consider data partitions to be combinable if and only if they are not strongly incongruent with one another (Sytsma, 1990; Bull et al., 1993; Huelsenbeck et al., 1996; Baum et al., 1998; Johnson and Soltis, 1998; Thornton and DeSalle, 2000; Yoder et al., 2001; Barker and Lutzoni, 2002; Buckley et al., 2002). Systematists who follow this prior agreement or conditional combination approach to analyzing multiple data partitions (Bull et al., 1993; Huelsenbeck et al., 1996; Johnson and Soltis, 1998) evaluate incongruence using tests such as the incongruence length difference (ILD) test (Farris et al., 1994, 1995) or other tests of taxonomic congruence (Templeton, 1983; Kishino and Hasegawa, 1989; Larson, 1994; Shimodaira and Hasegawa, 1999) before deciding whether the partitions should be analyzed in combination. Data that exhibit strong incongruence are then analyzed separately or under assumptions that minimize incongruence (Cunningham, 1997b). In their article “Failure of the ILD to determine data combinability for slow loris phylogeny,” Yoder et al. (2001) critiqued the ILD test based on the observation that it will sometimes identify data partitions as incongruent when in fact those partitions combine to produce an accurate estimate of organismal phylogeny. They described the ILD test as a failed test of data combinability, maintaining that the presumed accuracy of trees inferred from combined data indicates the congruence of the data partitions. We have two objections to their argument (2001:421) that “the ILD [should] never be used as a test of data partition combinability.” First, what Yoder et al. described as a flaw in the ILD test as applied to their data, i.e., an apparent inverse relationship between phylogenetic accuracy and data partition congruence as measured by the ILD test, turns out to be an artifact of analysis. There is in fact a bimodal relationship between congruence and accuracy: as either data partition is upweighted, homoplasy in the combined data set is swamped by homoplasy within the upweighted data partition, reducing the significance of the ILD test. At the same time, the topology of the combined analysis shifts to reflect the topology of the upweighted data partition. This phenomenon is predictable and can be accounted for in the analysis (Dowton and Austin, 2002). Second, Yoder et al.’s expectation that ILD test results should predict the phylogenetic accuracy of the combined data analysis is unreasonable. The ILD test is used to evaluate the null hypothesis that characters that make up two or more data partitions are drawn at random from a single population of characters, i.e., a population of characters that reflects a single phylogeny and a single set of evolutionary processes (Farris et al., 1995). Because accuracy of trees derived from a data set depends on many factors other than congruence among data partitions, the ILD test cannot be used to directly address questions related to phylogenetic accuracy. Genealogically discordant data can be combined to yield accurate phylogenies, whereas data that are congruent (both genealogically concordant and homogeneous in underlying evolutionary process) can be combined to yield phylogenies that do not accurately represent organismal history (Cunningham, 1997a). A damaging critique of the ILD test would have to appeal to criteria other than

Full Text