Abstract

Abstract Background: High throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of cancer. Copy number aberration (CNA) is an important type of genomic change, but detecting and characterizing CNA from whole-exome sequencing (WES) is challenging due to the high level of biases and artifacts. Methods: We propose CODEX, a normalization and CNA calling procedure for WES data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based WES data. CODEX can be used to detect both germline and somatic CNAs in cancer samples with or without matched normal. Results: Compared to existing approaches, CODEX is shown to be more effective in removing the biases in WES, and attains better sensitivity and specificity in detecting copy number aberrations by in silico spike-in studies. We further evaluate performance on 222 neuroblastoma samples with matched normal from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Project. We carry out systematic genome-wide analysis and detailed characterization of both germline and somatic copy number events. With a focus on a well-studied rare somatic CNV within the ATRX gene, we show that the cross-sample normalization procedure of CODEX is more effective in removing noise than the standard pipeline of normalizing the tumor against the matched normal, and that the segmentation procedure performs well in detecting CNVs with recurrent complex nested structures. For detecting germline mutations, CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be perform well on three microarray-based validation data sets. Conclusions: The cross-sample normalization procedure of CODEX, when applied to the matrix of tumor and normal samples, is more effective in reducing noise than normalizing each tumor to its matched normal. The somatic deletions in the ATRX region have a nested structure, which CODEX was able to recover. Through multiple types of validation, CODEX is shown to be applicable to a wide range of study designs for copy number estimation using WES data. Citation Format: Yuchao Jiang, Derek A. Oldridge, Sharon J. Diskin, Nancy R. Zhang. CODEX: a normalization and copy number variation detection method for whole-exome sequencing. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4868. doi:10.1158/1538-7445.AM2015-4868

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call