Abstract
BackgroundCopy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method.Methodology/Principal FindingsIn our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS.Conclusions/SignificanceOur study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing.
Highlights
Copy number variations (CNV) are known to be an important component of structural variation in the human genome, resulting from a mixture of meiotic recombination, homology-directed and non homologous repair of double-strand breaks, and errors in replication [1]
Conclusions/Significance: Our study showed that population-scale CNV calling (PSCC) can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing
Observation window selection To take into account the various mapping reads and the sequencing reads of different regions of the human genome, we used adjustable sliding windows in the genome to calculate the statistic of read counts (RC)
Summary
Copy number variations (CNV) are known to be an important component of structural variation in the human genome, resulting from a mixture of meiotic recombination, homology-directed and non homologous repair of double-strand breaks, and errors in replication [1]. Common CNVs represented in more than 1% of the population are defined as copy number polymorphisms (CNP). These polymorphisms may contribute to phenotypic variations and differences in disease susceptibility across different ethnic groups [6,7]. Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have