Abstract

Abstract We have developed a new method that uses high-throughput reads that span multiple somatic point mutations to reconstruct multiple, genetically diverse subclonal populations from one or more heterogeneous tumor samples. Subclonal reconstruction algorithms attempt to infer the prevalence and genotype of multiple, genetically-related subclonal populations using the variant allele frequency (VAF) of somatic variants. To date, these algorithms exclusively use data on individual somatic mutations. This restriction greatly reduces their ability to fully resolve phylogenic ambiguities. In some cases, it is possible to determine the mutation status of >1 mutation in a single cell, for example, when single reads cover multiple single nucleotide variants (SNVs). This type of information can add considerable power to the phylogenetic reconstruction of the tumor subclonal population. We have developed the PhyloSpan algorithm which attempts to infer the states of multiple SNVs in single cells, and then exploits that information in subclonal reconstruction. Our algorithm starts with phasing somatic SNVs by looking for reads / read-pairs that cover both a somatic mutation and germline heterozygous single nucleotide polymorphism (SNP). These germline SNPs are often available through profiling of normal tissue. PhyloSpan then identifies SNVs that are on the same chromosome and close enough to be covered by a single read or paired reads. These pairs of mutations provide more phylogenetic certainty than can be found by looking at mutations independently. For example, if those SNVs are found in the same evolutionary branch, then we expect to see some reads containing both mutations. If however, the SNVs are an separate branches then no reads should show both SNVs. PhyloSpan integrates this phylogenetic information, along with information about the VAF of each somatic SNV in order to perform subclonal reconstruction. Incorporating these various types of information requires a rigorous statistical approach, and so we have developed a Bayesian non-parametric tree-based clustering algorithm. This algorithm not only infers the number of subclonal populations and their genotype but also provides a measure of uncertainty about this inference, enabling users to determine which parts of the subclonal reconstruction are certain and which parts remain ambiguous. While the number of SNVs a short-read length distance away from another SNV is small, a handful of such pairs are all that is needed to eliminate a substantial amount of ambiguity in subclonal reconstruction. Furthermore, long read technologies, such as PacBio, can be used to supplement short reads. Our approach generalizes to permit the integration of single-cell sequencing with bulk tumor sequencing. We will present results applying our algorithm to whole genome sequencing data showing the added value of considering multiple SNVs compared to independent SNVs. Citation Format: Amit G. Deshwar, Levi Boyles, Jeff Wintersinger, Paul C. Boutros, Yee Whye Teh, Quaid Morris. PhyloSpan: using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4865. doi:10.1158/1538-7445.AM2015-4865

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call