Abstract In many cancer types, the evolution of subclonal malignant cells leads to a diverse tumor population that can affect treatment efficacy. In addition, evolution throughout treatment can lead to resistant subclonal populations and eventual relapse. To effectively treat these cancers, it is essential to understand the subclonal architecture of the tumor, including the somatic mutations that differentiate each population. Many tools have been developed to attempt to cluster mutations or reconstruct the clonal architecture of a given tumor sample. However, few can accommodate multiple samples, which is common in longitudinal or metastasis studies. In addition, these tools typically focus on just one of the steps necessary for a full analysis (e.g., variant clustering, subclone hierarchical tree reconstruction, and tree visualization). Many require non-standard data formats, necessitating data format conversion between each step. These requirements make it very challenging for investigators without a deep understanding of each tool to perform a full subclonal analysis. To make the identification of temporal and spatial heterogeneity more streamlined and accessible to analysts with basic programming knowledge, we have developed a pipeline that takes somatic mutations in standard VCF file format as input and produces meaningful results that are easy to interpret. Variant data is extracted and formatted for the input to PyClone-VI (Gillis and Roth, 2020), which is used to cluster somatic variants based on their allele frequencies at each time point. SuperSeeker, an improvement on the SubcloneSeeker method developed within our lab (Qiao et al., 2014), implements advanced tumor subclone reconstruction algorithms to jointly analyze multiple samples and construct hierarchical trees that account for all subclones observed in a patient. A sample trace, or the cellular fraction of each subclone found in each sample, is also provided. The output of SuperSeeker is an updated VCF file with the tree and sample trace information added to the header. Finally, a GraphViz rendering of the hierarchical tree is made for easy visualization. In addition to this rendering, the final VCF file can also be used for interactive analysis and visualization of the results using our Oncogene.iobio web tool. This pipeline makes identifying temporal and spatial heterogeneity more efficient, requiring only standard file types as input, and basic command line knowledge. In a recently published study, we investigated subclonal evolution in 38 patients with CLL being treated with a BTKi. The initial subclonal architecture analysis for each patient in this study was laborious and time intensive. Using this pipeline, we can now identify subclonal evolution in these patients and create an interactive visualization more efficiently. This approach can streamline analyses, increase efficiency, and lead to a deeper understanding of a tumor’s subclonal evolution. Citation Format: Gage Black, Yi Qiao, Xiaomeng Huang, Gabor Marth. Streamlining the reconstruction of subclonal evolution in DNA sequencing data. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4295.
Read full abstract