Abstract
Most of the evolutionary history reconstruction approaches are based on the infinite sites assumption, which states that mutations appear once in the evolutionary history. The Perfect Phylogeny model is the result of the infinite sites assumption and has been widely used to infer cancer evolution. Nonetheless, recent results show that recurrent and back mutations are present in the evolutionary history of tumors, hence the Perfect Phylogeny model might be too restrictive. We propose an approach that allows losing previously acquired mutations and multiple acquisitions of a character. Moreover, we provide an ILP formulation for the evolutionary tree reconstruction problem. Our formulation allows us to tackle both the Incomplete Directed Phylogeny problem and the Clonal Reconstruction problem when general evolutionary models are considered. The latter problem is fundamental in cancer genomics, the goal is to study the evolutionary history of a tumor considering as input data the fraction of cells having a certain mutation in a set of cancer samples. For the Clonal Reconstruction problem, an experimental analysis shows the advantage of allowing mutation losses. Namely, by analyzing real and simulated datasets, our ILP approach provides a better interpretation of the evolutionary history than a Perfect Phylogeny. The software is at https://github.com/AlgoLab/gppf.
Highlights
Our experiments show that the Persistent phylogeny that we compute usually provides a better interpretation of the input data than the Perfect Phylogeny, by computing a phylogeny with smaller overall error. while requiring a number of clones that is smaller than the number of mutations
The character-based phylogeny reconstruction problems we study in this paper are constrained versions of the general Incomplete Directed Perfect Phylogeny (IDP) [25]
In this paper we have proposed a integer linear programming (ILP) formulation of the problem of reconstructing the evolutionary history of tumors, where the evolutionary tree is character-based and can violate the infinite site assumption of the Perfect Phylogeny model
Summary
Lems in Bioinformatics, with a large literature [12, 15, 28, 30] focusing on a simple assumption: the input data consists of a set of species (or individuals) for which we know the set of characters that it possesses. A possible generalization (that we do not explore in this paper) is the multi-state perfect phylogeny that has been recently proposed in order to take into account the effect of copy number aberrations on alleles [10] In this new model — known as the infinite allele assumption — the characters. Can assume different states (i.e., the number of copies of a site) but, as in the binary case, a change to a given state can occur only once This restriction allows to obtain efficient algorithms, but most recent studies refutes it [21] and state that more complex models are needed to describe the tumor evolution. The inferred tree from real data on a Leukemia tumor CLL077 reveals the losses of a mutation, though being the tree mostly consistent with the one reconstructed by other known methods [20]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.