Abstract

Tumor cell population is a mixture of heterogeneous cell subpopulations, known as subclones. Identification of clonal status of mutations, i.e., whether a mutation occurs in all tumor cells or in a subset of tumor cells, is crucial for understanding tumor progression and developing personalized treatment strategies. We make three major contributions in this paper: (1) we summarize terminologies in the literature based on a unified mathematical representation of subclones; (2) we develop a simulation algorithm to generate hypothetical sequencing data that are akin to real data; and (3) we present an ultra-fast computational method, Mutstats, to infer clonal status of somatic mutations from sequencing data of tumors. The inference is based on a Gaussian mixture model for mutation multiplicities. To validate Mutstats, we evaluate its performance on simulated datasets as well as two breast carcinoma samples from The Cancer Genome Atlas project.

Highlights

  • Tumor cell population is known to be a mixture of heterogeneous cell subpopulations (Nowell, 1976; Marusyk and Polyak, 2010; Swanton, 2012; Yates and Campbell, 2012)

  • We make three major contributions in this paper: (1) we summarize terminologies in the literature based on a unified mathematical representation of subclones; (2) we develop a simulation algorithm to generate hypothetical sequencing data that are akin to real data; and (3) we present an ultra-fast computational method, Mutstats, to infer clonal status of somatic mutations from sequencing data of tumors

  • S is the total number of single nucleotide variants (SNVs) in a simulated sample, ystrue refers to the true clonal status of SNV s, and ys is the clonal status of SNV s determined by Mutstats

Read more

Summary

Introduction

Tumor cell population is known to be a mixture of heterogeneous cell subpopulations (Nowell, 1976; Marusyk and Polyak, 2010; Swanton, 2012; Yates and Campbell, 2012). A mutation is called clonal if it occurs across all the tumor cells. A number of terminologies are presented in the existing methods, defined under different characterizations of tumor heterogeneity. Many existing methods, such as Mutstats: Ultra-fast clonal status caller and data simulator. Simulated datasets with clearly labeled clonal status could greatly facilitate the development of novel inferential algorithms, both for sanity check and for comparison to alternative methods in the field.

Representation of Subclones
Data Preparation
Model for Mutation Multiplicities
Determining Clonal Status
Brief Review of Data Simulation Approaches
Performance on Simulated data
Sensitivity Analysis of H1 and H2
Comparison with Existing Methods
PyClone 75th-tile
TCGA BRCA Data Analysis
Method Clonal Subclonal Total Time
Findings
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.