Abstract
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are indistinguishable. This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.
Highlights
Explicit [1] or evolutionary [2, 3] phylogenetic networks are used to represent the evolution of organisms or genes that may inherit genetic material from more than one source
Phylogenetic Networks: What Can Be Reconstructed phylogenetic networks: most notably, the relative order of consecutive reticulate events cannot be determined by standard network inference methods
We formally prove that each network has a canonical form— where a number of nodes and branches are merged—representing all that can be uniquely reconstructed about the original network
Summary
Explicit [1] or evolutionary [2, 3] phylogenetic networks are used to represent the evolution of organisms or genes that may inherit genetic material from more than one source This may be caused by events such as hybrid speciation (e.g. in plants and animals [4, 5]), horizontal gene transfer (e.g. in bacteria [6, 7]), viral reassortment [8], or recombination (e.g. in viruses [9, 10] or in the genomes of sexually reproducing species [11,12,13]).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.