Abstract
BackgroundA major problem in identifying the best therapeutic targets for cancer is the molecular heterogeneity of the disease. Cancer is often caused by an accumulation of mutations which produce irreversible damage to the cell’s control mechanisms of survival and proliferation. Different mutations may affect these cellular anachronisms through a combination of molecular interactions which may be dynamically changing during cancer progression. It has been previously shown that cancer accumulates mutations over time. In this paper we address the problem of cancer heterogeneity by modeling cancer progression using somatic mutation and gene expression cross-sectional data.ResultsWe propose a novel formulation of integrating somatic mutation and gene expression data to infer the temporal sequence of events from cross-sectional data. Using a mixed integer linear program we model the interaction between groups of different mutated genes and the resulting modifications at the gene expression level. Our approach identifies a partition of mutation events which gradually produce gene expression changes to a partition of genes over time. The proposed formulation is tested using both simulated data and real breast cancer data with matched somatic mutations and gene expression measurements from The Cancer Genome Atlas. First, we classify the genes as oncogenes or tumor suppressors based on the frequency of driver mutations. As expected, the most frequently mutated genes in breast cancer are PIK3CA and TP53 genes. Then, we select those genes with most frequent driver mutations and a set of genes known to play roles in cancer development. Furthermore, we apply the proposed mixed integer linear program to identify the temporal order in which genes mutate and, simultaneously, the changes they produce at the gene expression level during cancer progression. In addition, we are able to identify known causal relationships between mutations and gene expression changes in PI3K/AKT and TP53 pathways.ConclusionsThis paper proposes a new model to infer the temporal sequence in which mutations occur and lead to changes at the gene expression level during cancer progression. The approach is general and can be applied to any data sets with available somatic mutations and gene expression measurements.Electronic supplementary materialThe online version of this article (doi:10.1186/s12918-016-0255-6) contains supplementary material, which is available to authorized users.
Highlights
A major problem in identifying the best therapeutic targets for cancer is the molecular heterogeneity of the disease
We address the issue of cancer heterogeneity by using both somatic mutation and gene expression data from cross-sectional measurements and propose a new Mixed Integer Linear Program (MILP) formulation to model the molecular progression of cancer
We propose a new Mixed Integer Linear Program (MILP) formulation which identifies the order in which mutations appear and produce changes of gene expression
Summary
A major problem in identifying the best therapeutic targets for cancer is the molecular heterogeneity of the disease. Clinical samples may not always be properly annotated; (iv) the reduced sample size with respect to the large number of measured features (genes or other molecules), which decreases the statistical power of the majority of commonly used approaches for biomarker discovery and patient stratification; (v) the heterogeneity of cancer mechanisms, which may dynamically change during cancer progression. Not all patients with the same type of cancer harbor the exact same set of genetic abnormalities, there seems to be at least a subset of such changes that are consistent across a set of patients This suggests that several combinations of mutations and gene expression patterns may lead to similar changes in cancer initiation and progression mechanisms such as apoptosis, differentiation, migration, and proliferation. Different molecular patterns may contribute to cancer growth and proliferation in a similar way, i.e., through the deregulation of similar cellular mechanisms
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have