Abstract

Abstract Detection and characterization of somatic structural variants (SVs) and copy-number variants (CNVs) from whole genome sequencing remains a challenging part of cancer analysis. Many callers have been developed that use different detection strategies, but most methods suffer from high rates of false positives and false negatives, and agreement between different callers is usually low. We have developed a flexible pipeline that combines the results of multiple callers, filters calls to remove likely artifacts, and functionally annotates the resulting variants. We employ a diverse set of variant callers utilizing a combination of read depth, read pair, and split read detection methods: NBIC-seq (Xi et al., 2011), Crest (Wang et al., 2011), Delly (Rausch et al., 2012), and BreakDancer (Chen et al., 2009). To remove artifact calls due to mis-mapping, we apply filters that discard predicted SVs whose breakpoints exhibit certain sequence features (e.g. extensive mapping ambiguity, high repeat content). SVs corresponding to known germline variants (1000G, DGV, in-house database) are marked and removed as unlikely somatic variants: this greatly helps to prevent both sequencing protocol- and caller-specific artifacts as well as false positive somatic calls arising from missed calls in the matched germline sample. Finally, we employ our sensitive split read mapper SplazerS to identify SV breakpoints with base pair precision. In this step, we are also able to remove remaining germline variants for which we find split read support in the matched normal sample. The final predicted structural variants are annotated for overlap with SVs in COSMIC, overlap with known cancer genes and potential impact on gene structure. We use a public synthetic data set (DREAM challenge; Boutros et al., 2014) to demonstrate that using our selected ensemble of tools significantly improves sensitivity as compared to any single caller and that our filters effectively remove artifacts. Further, we show results from a set of colorectal cancer samples (Brannon et al., 2014) in which highly similar primary and metastatic tumors show excellent agreement in somatic SV calls in the absence of overlap between unrelated samples. Results from testing our pipeline on TCGA glioblastoma multiforme tumors, for which validated genomic rearrangements are available, will also be presented. In conclusion, our pipeline improves detection of SVs by integrating orthogonal calling methods and facilitates identification of clinically relevant SVs through effective filters and cancer-specific functional annotation. Citation Format: Minita Shah, Dayna M. Oschwald, Soren Germer, Michael C. Zody, Toby Bloom, Anne-Katrin Emde. An integrated pipeline for detecting and characterizing structural variation in cancer. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4876. doi:10.1158/1538-7445.AM2015-4876

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.