Benchmarking germline CNV calling tools from exome sequencing data

Veronika Gordeeva,Elena Sharova,Konstantin Babalyan,Rinat Sultanov,Vadim M Govorun,Georgij Arapidi

doi:10.1038/s41598-021-93878-2

Abstract

Whole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1–2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of false-positive calls. Moreover, due to a lack of a gold standard CNV set, the results are limited and incomparable. Here, we aimed to perform a comprehensive analysis of tools capable of germline CNV calling available at the moment using a single CNV standard and reference sample set. Compiling variants from previous studies with Bayesian estimation approach, we constructed an internal standard for NA12878 sample (pilot National Institute of Standards and Technology Reference Material) including 110,050 CNV or non-CNV exons. The standard was used to evaluate the performance of 16 germline CNV calling tools on the NA12878 sample and 10 correlated exomes as a reference set with respect to length distribution, concordance, and efficiency. Each algorithm had a certain range of detected lengths and showed low concordance with other tools. Most tools are focused on detection of a limited number of CNVs one to seven exons long with a false-positive rate below 50%. EXCAVATOR2, exomeCopy, and FishingCNV focused on detection of a wide range of variations but showed low precision. Upon unified comparison, the tools were not equivalent. The analysis performed allows choosing algorithms or ensembles of algorithms most suitable for a specific goal, e.g. population studies or medical genetics.

Highlights

Copy number variations (CNVs) are variations of the number of copies of a DNA fragment in a population
To perform a unified comparative analysis: (1) we chose NA12878 as one of the most characterized samples of the Genome in a Bottle project; (2) we used exon as a minimal unit for comparison, (3) we constructed the set of CNV and non-CNV exons based on available CNV sets for the NA12878 using Bayes model, and (4) we evaluated the performances of 16 existing germline CNV tools (Table 1) using the same reference set
CNV is an important type of structural variation, accurate detection and interpretation of which are essential for both population studies, medical genetics, evolution, and cancer research

Summary

Introduction

Copy number variations (CNVs) are variations of the number of copies of a DNA fragment in a population. WES has many features that impede accurate CNV detection These include basic features (like capture step) and those originating from the PCR stages (problems with sequencing low complexity regions, dependence on GC content), directly affecting the over- and underrepresentation of target regions, which can be mistakenly interpreted as CNVs. Multiple tools have been elaborated to detect CNVs in exome data; they mainly use the read depth-based strategy, in which the number of reads (read count, RC) mapped onto a fragment of interest is being e valuated[9,10]. Multiple tools have been elaborated to detect CNVs in exome data; they mainly use the read depth-based strategy, in which the number of reads (read count, RC) mapped onto a fragment of interest is being e valuated[9,10] These tools vary greatly at every step of the analysis, including read-depth distribution assumption, RC data normalization, and segmentation approach (Table 1). Base-level log-ratios, GC-content, library-size correction, calling region significant based on normal distribution, CBS for large variation

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: Jul 13, 2021
Citations: 46	License type: open-access

R Discovery Prime

R Discovery Prime

Benchmarking germline CNV calling tools from exome sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Comparison of kNN and k-means optimization methods of reference set selection for improved CNV callers performance
Wiktor Kuśmirek ... Robert Nowak
BMC Bioinformatics | VOL. 20
Wiktor Kuśmirek, et. al.Wiktor Kuśmirek ... Robert Nowak
28 May 2019
BMC Bioinformatics | VOL. 20

CNV-Z; a new tool for detecting copy number variation in next generation sequencing data
Emma Adolfsson ... Anna Greén
SoftwareX | VOL. 24
Emma Adolfsson, et. al.Emma Adolfsson ... Anna Greén
22 Sep 2023
SoftwareX | VOL. 24

Genome-wide Transcriptome Profiling Reveals the Functional Impact of Rare De Novo and Recurrent CNVs in Autism Spectrum Disorders
Rui Luo ... Daniel H Geschwind
The American Journal of Human Genetics | VOL. 91
Rui Luo, et. al.Rui Luo ... Daniel H Geschwind
21 Jun 2012
The American Journal of Human Genetics | VOL. 91

Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning.
Renjie Tan ... Yufeng Shen
Nucleic acids research | VOL. 50
Renjie Tan, et. al.Renjie Tan ... Yufeng Shen
16 Sep 2022
Nucleic acids research | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benchmarking germline CNV calling tools from exome sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports