Abstract

Abstract Aneuploidy, the phenomenon that genomes acquire or lose chromosomal fragments, has been causally implicated in a wide variety of human cancer. Defining which copy number alterations (CNAs) are pathogenic is an important goal of cancer research. However, data based on bulk samples cannot fully depict tumor heterogeneity and evolution, which occurs in single cells, and thus have limited power to discover CNAs useful for cancer diagnostics and therapeutics. Recent advances in single-cell DNA sequencing have enabled acquisition of single-cell copy number (SCCN) profiles in tens of thousands of cells, which potentiate reconstruction of copy number evolution lineage and discovery of novel cancer genes. However, current analytical approaches that infer clonal lineages are neither accurate, nor scalable to analyzing molecular features, particularly CNA profiles obtained from thousands of cells. Statistical routines have not been established to leverage lineage tracing results towards identifying cancer-related genes. Here, we present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles. In a MEDALT, each node represents a cell, each edge represents a kinship between two cells, arrows point towards younger cells, and the root represents a normal diploid cell. We also present a statistical routine named lineage speciation analysis, which facilitates discovery of fitness-associated alterations (FAAs) and genes from SCCN lineage trees. To evaluate MEDALT, we simulated copy number evolution under various CNA mechanisms such as genome doubling, breakage-fusion-bridge (BFB), etc. and spiked in FAAs. We found that MEDALT substantially improved accuracy of FAA identification over GISTIC and conventional phylogenetics methods such as maximum likelihood (ML), maximal parsimony (MP) and neighbor joining (NJ) trees. We applied our methods on the single-cell DNA-sequencing data acquired from 20 triple-negative breast cancer patients (TNBCs), 4 of which had longitudinal pre-, mid- and post- treatment samples. Most of the TNBC samples appeared to have developed through branched evolution via multiple parallel lineages with distinct mutation rates and DNA damage repair (DDR) loss based on the constructed MEDALT. Using our approaches, we discovered novel genes that are predictive of patient survival in TCGA breast cancer data and are functionally more essential than other control gene sets, based on the CRISPR-cas9 knockout screen data obtained from 27 breast cancer cell lines in the DepMap database. Significant benefits in lineage tracing and cancer gene discovery were also achieved, when applying our approaches on the SCCN profiles derived using InferCNV from the single-cell RNA sequencing data of a cohort of 20 multiple myeloma, head and neck cancer, oral squamous cell carcinoma and ovarian cancer patients. The source code of our study is available at https://github.com/KChen-lab/MEDALT. Citation Format: Fang Wang, Qihan Wang, Vakul Mohanty, Shaoheng Liang, Jinzhuang Dou, Jincheng Han, Darlan Minussi, Ruli Gao, Li Ding, Nicholas Navin, Ken Chen. Single-cell copy number heterogeneity tracing enabling cancer gene discovery [abstract]. In: Proceedings of the AACR Virtual Special Conference on Tumor Heterogeneity: From Single Cells to Clinical Impact; 2020 Sep 17-18. Philadelphia (PA): AACR; Cancer Res 2020;80(21 Suppl):Abstract nr PO-017.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call