Abstract The characterization of the landscape of genetic lesions that underlie cancer has been significantly advanced with the recent application of next-generation sequencing (NGS) technology. This methodology can be used to sequence selected subsets of genes, the whole exome, the whole genome, or the expressed transcriptome within a cancer cell. By comparing the acquired sequences from both a cancer and matched normal tissue sample from the same patients, one should be able to identify almost all somatic lesions within the cancer. As part of the St Jude Children's Research Hospital - Washington University Pediatric Cancer Genome Project (PCGP), we have undertaken the approach of performing whole genome sequencing (WGS) on 600 pediatric cancers and matched control tissue (1200 total genomes). Although the acquisition of the primary sequence is a formidable challenge, the analysis of these data is where the real work begins. Unfortunately, the majority of published NGS analysis methods were developed to identify germ line variation and therefore perform sub-optimally when applied to the task of identifying somatic mutations in cancer genomes. This is in part a result of the distinct difference in logic that must be used to accurately identify all somatic lesions within a cancer. A cancer genome typically exists within a heterogenous DNA sample that is composed of normal cells admixed with an oligoclonal tumor sample. Moreover, the range of somatic lesions seen in cancer is broader than what is seen as part of germ line genetic variation, with some cancers having exceedingly complex genomes containing focal insertions, deletions, inversions, intra-chromosomal and inter-chromosomal rearrangements and large copy number abnormalities. The accurate identification of these lesions requires not only the presence of the lesions within the cancer DNA, but also their absence from the matched germ line sample. To approach these problems, we, as well as others, have recently developed new analytical approaches to enhance our ability to identify the somatic mutations in cancer. The starting point for these analyses is ≥75 bp paired-end sequencing reads from patient matched tumor and normal DNA samples. Our goal is to identify all somatic single-nucleotide variation (SNV), small insertion/deletion (indel), copy number alteration (CNA) and structural variation (SV) that occur within the cancer DNA sample. Paired tumor-normal NGS data were analyzed together to ensure sensitivity for detecting DNA alterations in tumor and for confirming their absence in the matched normal sample. Somatic lesions initially identified by mapped NGS reads were further analyzed using more accurate algorithms to correct errors cause by suboptimal NGS mapping. The sensitivity of the methods we have developed depends on the read depth, but with WGS at 30X haploid coverage we are able to detect mono-allelic mutation present in as low as ∼25% of the analyzed cellular populations. This sensitivity can be significantly enhanced with greater read densities. Key among the methods we have developed are two new algorithms focused on identifying gross DNA alterations: CREST (Clipping REveals STructure) for SV analysis and CONSERTING (COpy Number SEgmentation by Regression Tree) for CNA analysis. CREST uses sequencing reads with partial alignments to the reference human genome (so-called soft-clipped reads) to directly map the breakpoints of somatic SVs. CONSERTING integrates read depth analysis with SV detection and adjust for sequencing artifacts, coverage bias and germ line CNVs. Together, these methods identify somatic lesions with a high validation rate (92-98% of SNV and Indels, 80% for SVs). In this talk, I will highlight the NGS analytic pipeline we have developed and the recent discoveries that have emerged through its application to pediatric cancer genomes. In addition, I will point out some of the significant challenges that remain to be tacked in order for us to identify the full landscape and functional consequences of the somatic mutations in cancer. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr SY25-01. doi:1538-7445.AM2012-SY25-01
Read full abstract