In phylogenetics, evolution is traditionally represented in a tree-like manner. However, phylogenetic networks can be more appropriate for representing evolutionary events such as hybridization, horizontal gene transfer, and others. In particular, the class of forest-based networks was recently introduced to represent introgression, in which genes are swapped between species. A network is forest-based if it can be obtained by adding arcs to a collection of trees, so that the endpoints of the new arcs are in different trees. This contrasts with so-called tree-based networks, which are formed by adding arcs within a single tree.We are interested in the computational complexity of recognizing forest-based networks, which was recently left as an open problem by Huber et al. It has been observed that forest-based networks coincide with directed acyclic graphs that can be partitioned into induced paths, each ending at a leaf of the original graph. Several types of path partitions have been studied in the graph theory literature, but to our best knowledge this type of ‘leaf induced path partition’ has not been directly considered before. The study of forest-based networks in terms of these partitions allows us to establish closer relationships between phylogenetics and algorithmic graph theory, and to provide answers to problems in both fields.More specifically, we show that deciding whether a network is forest-based is NP-complete, even on input networks that are tree-based, binary, and have only three leaves. This shows that partitioning a directed acyclic graph into a constant number of induced paths is NP-complete, answering a recent question of Fernau et al. We then show that the problem is polynomial-time solvable on binary networks with two leaves and on the recently introduced class of orchards, which we show to be always forest-based. Finally, for undirected graphs, we introduce unrooted forest-based networks and provide hardness results for this class as well.
Read full abstract