Abstract

Abstract Computational cancer phylogenetics can play an important role in delineating possible tumor progression pathways and identifying molecular subtypes and mechanisms of action. We previously developed a pipeline for constructing tumor phylogenies from recurring cell types computationally inferred from whole genome copy number data. The accuracy and detail of these tumor phylogenies, however, depends on the identification of accurate and high-resolution molecular markers of progression, i.e., reproducible regions of copy number variation that can be used to robustly differentiate different subtypes and stages of progression. Here we present a new method for the problem using hidden Markov models (HMMs) to derive robust, high resolution progression markers from sets of tumor samples. We demonstrate our method on a publicly available array comparative genome hybridization (aCGH) dataset (NCBI GEO GSE16672, Navin et al., 2010) from sectioned primary ductal breast tumors. Our method uses an HMM, a class of probabilistic models, to classify sets of aCGH data into a partitioning of samples into normal (diploid) or amplified at each copy number probe. It differs from other similar HMM methods primarily in seeking a parsimonious set of combinations of amplification states able to explain all aCGH profiles simultaneously in order to identify robust markers of progression across samples. The model learns frequencies with which different combinations of amplifications are observed across the samples by modeling individual probes as Gaussian random variables with either normal or tetraploid means, with data more consistent with tetraploid being classified as amplified and those more consistent with diploid classified as normal. To handle a combinatorial explosion in combinations of amplification states with increasing numbers of samples, the method introduces a Gibbs sampling algorithm to learn a parsimonious model of the most frequently occurring combinations of amplification states. We applied our methods to a previously constructed set of inferred cell types derived from the Navin et al. data (Tolliver et al., 2010) and to a comparison set of 9 random samples from the raw aCGH data. We validated our model relative to manual labeling of amplicons on the same data. In both experiments, the HMM method was able to pick up significantly larger numbers of robustly amplified segments per chromosome than did prior methods or manual analysis. The resulting segments can be directly fed into downstream analysis routines for phylogeny inference or other predictions. In future work, the HMM method may be improved by fine-tuning the underlying model for copy number variation. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 3964. doi:1538-7445.AM2012-3964

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call