Abstract Broad copy number alterations (CNAs) at chromosomal band resolution have been used as cancer diagnostic and prognostic biomarkers for decades. These events have been characterized by cytogenetic imaging, an approach which is powerful for assessing heterogeneity but limited for locus fine mapping. While CNA detection by next-generation sequencing has become a standard analysis, existing methods rarely model CNA heterogeneity in bulk tumor samples and often require a paired normal sample for control. To overcome these limitations, we developed Seq2Karyotype (S2K, https://github.com/chenlab-sj/Seq2Karyotype), a new algorithm for in-silico karyotyping using single-sample whole-genome sequencing (WGS) data. S2K performs joint modeling of read-depth and allelic imbalance (AI) of high-quality heterozygous SNPs to identify reference diploid regions, followed by modeling and segmenting the deviation from the reference. Empirical coverage and AI of segmented regions are fitted to models of single and admixed CNAs to estimate clonality. The final karyotyping considers both the model fitness and minimization of evolution steps.To evaluate S2K’s performance, we analyzed two cell lines commonly used for benchmark test: COLO829 (melanoma) and HCT1395 (breast cancer); both had single-cell (sc) WGS for validation. For COLO829, S2K replicated the four populations detected by scWGS but derived a different clonality estimate. The predominant clone, defined by loss of 1p, 10p, and chr18, was estimated to have 67% cellular fraction (CF) by bulk sample analysis of S2K in contrast to the 10% CF by scWGS analysis. In HCT1395, S2K identified three new CNAs present in >50% of the cells of the matching germline sample which were subsequently validated by FISH, karyotyping and SKY mapping.To demonstrate S2K’s utility, we analyzed three diverse data sets: 17 neuroblastoma cell lines, 24 pediatric AML samples with karyotyping data, and two blood samples from children with myelodysplastic syndromes (MDS) known to harbor mosaic uniparental disomy (UPD). CNA-based intra-tumor heterogeneity was detected in 88% (15/17) of the neuroblastoma cell lines, comprised of 2-4 distinct populations with diverse ranges of CF (~10%-90%) and varying CNA patterns (e.g. admixture of tetraploid and diploid cells), which were validated by cytogenetics or scWGS. In the patient AML samples, S2K detected >95% of the previously reported cytogenetic events and 30% of additional copy-neutral loss-of-heterozygosity events. The mosaic UPD events in MDS patients were detected with projected clonality of 60% and 25%, respectively. These results not only demonstrate the accuracy of in-silico karyotyping performed by S2K but also reveal the dynamic intra-tumor heterogeneity in cancer cell lines, which may impact the design and interpretation of future experiments using these cell lines. Citation Format: Limeng Pu, Karol Szlachta, Virginia Valentine, Xiaolong Chen, Jian Wang, Dennis Kennetz, Daniel Putnam, Sivaraman Natarajan, Li Dong, Thomas Look, Marcin Wlodarski, Lu Wang, Steven Burden, John Easton, Xiang Chen, Jinghui Zhang. Seq2Karyotype (S2K): A method for deconvoluting heterogeneity of copy number alterations using single-sample whole-genome sequencing data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7419.
Read full abstract