Abstract

MotivationThe identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to identify these constraints, and return Directed Acyclic Graphs (DAGs) of restrictions where arrows indicate dependencies or constraints. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes—e.g. those with reciprocal sign epistasis—cannot be represented by CPMs.ResultsUsing simulated data under 500 fitness landscapes, I show that CPMs’ performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime and fitness landscape features, in ways that depend on CPM method. Using three cancer datasets, I show that these problems strongly affect the analysis of empirical data: fitness landscapes that are widely different from each other produce data similar to the empirically observed ones and lead to DAGs that infer very different restrictions. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs. Availability and implementationCode available from Supplementary Material.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Epistatic interactions between genetic alterations constraint the order of accumulation of mutations during cancer progression

  • Finding these constraints can single out therapeutic targets and disease markers and has lead to the development of cancer progression models (CPMs—Beerenwinkel et al, 2015), such as CBN (Gerstung et al, 2009, 2011) or CAPRI (Caravagna et al, 2016; Ramazzotti et al, 2015), that try to identify these constraints using genotype frequency data from cross-sectional samples

  • That is the number of genes in the pancreatic cancer dataset, the execution time of CBN increases steeply with number of genes beyond about seven genes (Diaz-Uriarte, 2015), seven genes is probably close to the upper limits of fitness landscapes that can be visualized and related to their true Directed Acyclic Graphs (DAGs), and if number of genes has an effect on the problems reported in this article they are likely to become worse with increasing numbers of genes

Read more

Summary

Introduction

Epistatic interactions between genetic alterations constraint the order of accumulation of mutations during cancer progression (e.g. in colorectal cancer APC mutations precede KRAS mutations— Fearon and Vogelstein, 1990). Finding these constraints can single out therapeutic targets and disease markers and has lead to the development of cancer progression models (CPMs—Beerenwinkel et al, 2015), such as CBN (Gerstung et al, 2009, 2011) or CAPRI (Caravagna et al, 2016; Ramazzotti et al, 2015), that try to identify these constraints using genotype frequency data from cross-sectional samples.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call