Abstract

BackgroundCopy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. Nevertheless, CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Since many available algorithms only address one or two of the three factors, a high false discovery rate (FDR) often results when identifying CNV. Therefore, we have developed a new CNV detection pipeline which is based on hybridization and amplification rate correction (CNVhac).MethodsCNVhac first estimates the allelic concentrations (ACs) of target sequences by using the sample independent parameters trained through physicochemical hybridization law. Then the raw CN is estimated by taking the ratio of AC to the corresponding average AC from a reference sample set for one specific site. Finally, a hidden Markov model (HMM) segmentation process is implemented to detect CNV regions.ResultsBased on public HapMap data, the results show that CNVhac effectively smoothes the genomic waves and facilitates more accurate raw CN estimates compared to other methods. Moreover, CNVhac alleviates, to a certain extent, the sample dependence of inference and makes CNV calling with appreciable low FDRs.ConclusionCNVhac is an effective approach to address the common difficulties in SNP array analysis, and the working principles of CNVhac can be easily extended to other platforms.

Highlights

  • Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level

  • We developed a novel CNV detection pipeline based on hybridization and amplification rate correction (CNVhaca) to accurately detect CNVs for Affymetrix SNP array

  • To assess the performance of CNV calling, we compare CNVhac with another popular approach known as Birdsuite [13], which is asserted to be the best for CNV inference with Affymetrix SNP arrays [5]

Read more

Summary

Introduction

Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Many methods have been developed for CNV calling from array-based data [7,13,14,15,16], their accuracies are still far from satisfactory by the high incidence of false discovery rates (FDRs). The high FDRs of these methods mainly arise from (1) cross-hybridization of probes [20], (2) genomic waves of intensities [21,22,23] and (3) sample dependence of outputs [24,25,26]

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.