Abstract

Discovery of copy number variations (CNVs), a major category of structural variations, have dramatically changed our understanding of differences between individuals and provide an alternate paradigm for the genetic basis of human diseases. CNVs include both copy gain and copy loss events and their detection genome-wide is now possible using high-throughput, low-cost next generation sequencing (NGS) methods. However, accurate detection of CNVs from NGS data is not straightforward due to non-uniform coverage of reads resulting from various systemic biases. We have developed an integrated platform, iCopyDAV, to handle some of these issues in CNV detection in whole genome NGS data. It has a modular framework comprising five major modules: data pre-treatment, segmentation, variant calling, annotation and visualization. An important feature of iCopyDAV is the functional annotation module that enables the user to identify and prioritize CNVs encompassing various functional elements, genomic features and disease-associations. Parallelization of the segmentation algorithms makes the iCopyDAV platform even accessible on a desktop. Here we show the effect of sequencing coverage, read length, bin size, data pre-treatment and segmentation approaches on accurate detection of the complete spectrum of CNVs. Performance of iCopyDAV is evaluated on both simulated data and real data for different sequencing depths. It is an open-source integrated pipeline available at https://github.com/vogetihrsh/icopydav and as Docker’s image at http://bioinf.iiit.ac.in/icopydav/.

Highlights

  • With the advent of high throughput sequencing techniques, there has been considerable interest in identifying population-specific structural variants (SVs), and their possible role in disease

  • In iCopyDAV, we provide two segmentation approaches: Total Variation Minimization (TVM) algorithm based on agglomerative approach, wherein the adjacent bins having similar RD values are merged into larger segments, and Circular Binary Segmentation (CBS) algorithm based on divisive approach, in which the genomic regions are divided into segments such that the bins in a segment have similar read depth (RD) values

  • The predicted copy number variations (CNVs) in each are validated against 6 different studies reported in Database of Genomic Variants (DGV) for NA12878 sample

Read more

Summary

Introduction

With the advent of high throughput sequencing techniques, there has been considerable interest in identifying population-specific structural variants (SVs), and their possible role in disease. The repair mechanism may misalign the DSBs to another homologous region, resulting in non-allelic homologous recombination (NAHR) (aided by repeats and segmental duplications). DNA replication fork stalls, lagging strand disengage from the DNA and shifts to another replicating DNA fragment and restarts the process, termed as fork stalling and template switching (FoSTeS) (aided by repeats). This mechanism results in the generation of non-recurrent CNVs. Mobile element insertion (MEI) is a mechanism that allows transposable elements to make their copies and insert in new locations that are usually flanked with inverted repeats (mediated by retrotransposons, DNA transposons and endogeneous retroviruses) [2,3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call