Perfect Phylogeny Research Articles

Eukaryotic cells contain organelles called mitochondria that have their own genome. Most cells contain thousands of mitochondria which replicate, even in nondividing cells, by means of a relatively error-prone process resulting in somatic mutations in their genome. Because of the higher mutation rate compared to the nuclear genome, mitochondrial mutations have been used to track cellular lineage, particularly using single-cell sequencing that measures mitochondrial mutations in individual cells. However, existing methods to infer the cell lineage tree from mitochondrial mutations do not model "heteroplasmy," which is the presence of multiple mitochondrial clones with distinct sets of mutations in an individual cell. Single-cell sequencing data thus provide a mixture of the mitochondrial clones in individual cells, with the ancestral relationships between these clones described by a mitochondrial clone tree. While deconvolution of somatic mutations from a mixture of evolutionarily related genomes has been extensively studied in the context of bulk sequencing of cancer tumor samples, the problem of mitochondrial deconvolution has the additional constraint that the mitochondrial clone tree must be concordant with the cell lineage tree. We formalize the problem of inferring a concordant pair of a mitochondrial clone tree and a cell lineage tree from single-cell sequencing data as the Nested Perfect Phylogeny Mixture (NPPM) problem. We derive a combinatorial characterization of the solutions to the NPPM problem, and formulate an algorithm, MERLIN, to solve this problem exactly using a mixed integer linear program. We show on simulated data that MERLIN outperforms existing methods that do not model mitochondrial heteroplasmy nor the concordance between the mitochondrial clone tree and the cell lineage tree. We use MERLIN to analyze single-cell whole-genome sequencing data of 5220 cells of a gastric cancer cell line and show that MERLIN infers a more biologically plausible cell lineage tree and mitochondrial clone tree compared to existing methods. https://github.com/raphael-group/MERLIN.

Read full abstract

BackgroundTumors exhibit extensive intra-tumor heterogeneity, the presence of groups of cellular populations with distinct sets of somatic mutations. This heterogeneity is the result of an evolutionary process, described by a phylogenetic tree. In addition to enabling clinicians to devise patient-specific treatment plans, phylogenetic trees of tumors enable researchers to decipher the mechanisms of tumorigenesis and metastasis. However, the problem of reconstructing a phylogenetic tree T given bulk sequencing data from a tumor is more complicated than the classic phylogeny inference problem. Rather than observing the leaves of T directly, we are given mutation frequencies that are the result of mixtures of the leaves of T. The majority of current tumor phylogeny inference methods employ the perfect phylogeny evolutionary model. The underlying Perfect Phylogeny Mixture (PPM) combinatorial problem typically has multiple solutions.ResultsWe prove that determining the exact number of solutions to the PPM problem is #P-complete and hard to approximate within a constant factor. Moreover, we show that sampling solutions uniformly at random is hard as well. On the positive side, we provide a polynomial-time computable upper bound on the number of solutions and introduce a simple rejection-sampling based scheme that works well for small instances. Using simulated and real data, we identify factors that contribute to and counteract non-uniqueness of solutions. In addition, we study the sampling performance of current methods, identifying significant biases.ConclusionsAwareness of non-uniqueness of solutions to the PPM problem is key to drawing accurate conclusions in downstream analyses based on tumor phylogenies. This work provides the theoretical foundations for non-uniqueness of solutions in tumor phylogeny inference from bulk DNA samples.

Read full abstract

Perfect Phylogeny Research Articles

Articles published on Perfect Phylogeny

The Bayesian Phylogenetic Bootstrap, Application to Short Trees and Branches.

Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data.

Three Metaheuristic Approaches for Tumor Phylogeny Inference: An Experimental Comparison

Enumeration of binary trees compatible with a perfect phylogeny

Strong inequalities and a branch-and-price algorithm for the convex recoloring problem

Finding Optimal Triangulations Parameterized by Edge Clique Cover

Accuracy in Near-Perfect Virus Phylogenies.

Combinatorial perspectives on Dollo-k characters in phylogenetics

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors

Tumor Phylogeny Topology Inference via Deep Learning.

PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem.

Combinatorial views on persistent characters in phylogenetics

Finding Most Compatible Phylogenetic Trees over Multi-State Characters

No background in biology is assumed

Clustering the biological networks using shortest path.

PerfectphyloR: An R package for reconstructing perfect phylogenies

PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.

Implications of non-uniqueness in phylogenetic deconvolution of bulk DNA samples of tumors

Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative Approach.

Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Perfect Phylogeny Research Articles

Articles published on Perfect Phylogeny

The Bayesian Phylogenetic Bootstrap, Application to Short Trees and Branches.

Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data.

Three Metaheuristic Approaches for Tumor Phylogeny Inference: An Experimental Comparison

Enumeration of binary trees compatible with a perfect phylogeny

Strong inequalities and a branch-and-price algorithm for the convex recoloring problem

Finding Optimal Triangulations Parameterized by Edge Clique Cover

Accuracy in Near-Perfect Virus Phylogenies.

Combinatorial perspectives on Dollo-k characters in phylogenetics

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors

Tumor Phylogeny Topology Inference via Deep Learning.

PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem.

Combinatorial views on persistent characters in phylogenetics

Finding Most Compatible Phylogenetic Trees over Multi-State Characters

No background in biology is assumed

Clustering the biological networks using shortest path.

PerfectphyloR: An R package for reconstructing perfect phylogenies

PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.

Implications of non-uniqueness in phylogenetic deconvolution of bulk DNA samples of tumors

Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative Approach.

Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach