Abstract

Simple SummaryCopy-number variations (CNVs) have important clinical implications for several diseases and cancers. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. For nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS.Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. Relevant CNVs are hard to detect because common structural variations define large parts of the human genome. CNV calling from short-read sequencing would allow single protocol full genomic profiling. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. Additionally, for nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Several tools had better performance for NA12878, which could be a result of overfitting. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS. Reducing the total number of called variants could potentially be assisted by the use of background panels for filtering of frequently called variants.

Highlights

  • Large cohort-based genome-wide association studies provided us with the tools and knowledge to understand numerous phenotypic traits and diseases by single nucleotide polymorphisms (SNPs) and short (

  • By reviewing 50 tools for Copy-number variations (CNVs) calling, of which 11 were included for a benchmark (CLC Genomics Workbench (WGS and whole exome sequencing (WES)), cn.MOPS (WGS and WES), CNVkit (WES), CNVnator (WGS), CODEX2 (WGS), Control-FREEC (WGS), DELLY (WGS), ExomeDepth (WES), GATK gCNV (WES and whole genome sequencing (WGS)), Lumpy (WGS), and Manta (WES and WGS)), we conclude that CNV identification from NGS data remains challenging

  • For the best reliability of CNV calling from NGS data, we observed that even if the tools were developed for WES data or allowed it as input, they did not perform well

Read more

Summary

Introduction

Large cohort-based genome-wide association studies provided us with the tools and knowledge to understand numerous phenotypic traits and diseases by single nucleotide polymorphisms (SNPs) and short (

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call