Abstract

BackgroundStructural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall.ResultsWe comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms.ConclusionThese results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.

Highlights

  • Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases

  • A reference SV dataset for the real data was generated by merging the Database of genome variants (DGV) dataset corresponding to NA12878 and the INS, DEL, and INV data detected from NA12878 long read assemblies (Additional file 1: Table S4; see the “Methods” section for details)

  • Accuracy for calling breakpoints, sizes, and genotypes of SVs We evaluated the accuracy with which each algorithm called breakpoints (BPs) and SV length using the simulated genome (Sim-A) data (Additional file 3: Table S14; see the “Methods” section for Root mean squared error (RMSE))

Read more

Summary

Introduction

Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Genomic structural variations (SVs) are generally defined as deletions (DELs), insertions (INSs), duplications (DUPs), inversions (INVs), and translocations (TRAs) of at least 50 bp in size. SVs are often considered separately from small variants, including single nucleotide variants (SNVs) and short insertions, and deletions (indels), as these are often formed by distinct mechanisms [1]. SVs are largely responsible for the diversity and evolution of human genomes at both individual and population level [3,4,5,6]. SVs could have higher impacts on gene functions and phenotypic changes than do SNVs and short indels. SVs are associated with a number of human diseases, including neurodevelopmental disorders and cancers [3, 8,9,10,11]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.