Abstract

Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massively parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for integration into existing analysis pipelines. CNVkit is freely available from https://github.com/etal/cnvkit.

Highlights

  • Copy number changes are a useful diagnostic indicator for many diseases, including cancer

  • Tools have been developed for copy number analysis of these datasets, as well, including CNVer [6], ExomeCNV [7], exomeCopy [8], CONTRA [9], CoNIFER [10], ExomeDepth [11], VarScan 2 [12], XHMM [13], ngCGH [14], EXCAVATOR [15], CANOES [16], PatternCNV [17], CODEX [18], and recent versions of Control-FREEC [19] and cn.MOPS [20]

  • We evaluated our method on DNA sequencing data from targeted sequencing of the melanoma cell line C0902 [42] and two sets of samples, referred to here as “Targeted sequencing (TR)” and “exome panel (EX)”, derived from a recent study of advanced melanomas [43]:

Read more

Summary

Introduction

Copy number changes are a useful diagnostic indicator for many diseases, including cancer. The on– and off-target read depths are combined, normalized to a reference derived from control samples, corrected for several systematic biases to result in a final table of log2 copy ratios.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.