Abstract Background Subpopulations of tumor cells characterized by mutation profiles may confer differential fitness to treatment and prognosis across cancers. Understanding subclonal architecture has the potential to provide biological insight into tumor evolution and advance the precision treatment of cancers. Recent methods comprehensively integrate single nucleotide variants (SNVs) and copy number alterations (CNAs) to reconstruct subclonal architecture using whole-genome sequencing (WGS) data from bulk tumor samples. However, most methods follow a Bayesian framework and require extensive computational resources, a prior knowledge of the number of subclones, as well as ad hoc post-analysis data processing. Altogether this creates a bottleneck in processing time in large-scale studies. Objectives The primary objective of this study is to introduce a fast and accurate subclonal architecture reconstruction method, which utilizes a model-based clustering approach and addresses all the limitations above. Methods We introduce a novel model-based clustering method: Clonal structure identification through pairwise penalization (CliP). CliP assumes the number of reads observed with variant alleles follows a binomial model, which is a function of mutation cellular prevalence (CP), copy number aberrations, and tumor purity. We propose to minimize a penalized likelihood of this model with a SCAD penalty on CPs across pairs of mutations. The optimization problem is then efficiently solved via Alternating Direction Method of Multipliers (ADMM). As a subclonal reconstruction algorithm, CliP attempts to infer the population structure of heterogeneous tumors, and is the first method to utilize a regularized maximum likelihood framework in subclonal reconstruction, therefore benefiting from its computational efficiency in parameter estimation. Results Rigorous and extensive simulation results demonstrate that CliP is 100 times faster than MCMC-based algorithms without decreased performance. Unlike previous models, the CliP model is applicable to regions with or without CNAs. Furthermore, CliP generates subclonal structure without prior knowledge or post-processing. In an application to WGS data from Pan-Cancer Analysis of Whole Genomes (PCAWG), it only took 8 hours to process 2,500 tumor samples. Conclusion Since CliP executes quickly, it is ultimately suitable for 1) processing large datasets with thousands of samples and 2) participation in a group of methods to generate consensus calls. As the sizes of datasets continue to grow, CliP represents an important step towards fast and accurate subclonal reconstruction. Citation Format: Yujie Jiang, Kaixian Yu, Hongtu Zhu, Wenyi Wang. CliP: A model-based method for subclonal architecture reconstruction using regularized maximum likelihood estimation [abstract]. In: Proceedings of the AACR Virtual Special Conference on Tumor Heterogeneity: From Single Cells to Clinical Impact; 2020 Sep 17-18. Philadelphia (PA): AACR; Cancer Res 2020;80(21 Suppl):Abstract nr PO-029.
Read full abstract