Abstract

Recent advances in high-throughout sequencing technologies have made it possible to accurately assign copy number (CN) at CN variable loci. However, current analytic methods often perform poorly in regions in which complex CN variation is observed. Here we report the development of a read depth-based approach, CNVrd2, for investigation of CN variation using high-throughput sequencing data. This methodology was developed using data from the 1000 Genomes Project from the CCL3L1 locus, and tested using data from the DEFB103A locus. In both cases, samples were selected for which paralog ratio test data were also available for comparison. The CNVrd2 method first uses observed read-count ratios to refine segmentation results in one population. Then a linear regression model is applied to adjust the results across multiple populations, in combination with a Bayesian normal mixture model to cluster segmentation scores into groups for individual CN counts. The performance of CNVrd2 was compared to that of two other read depth-based methods (CNVnator, cn.mops) at the CCL3L1 and DEFB103A loci. The highest concordance with the paralog ratio test method was observed for CNVrd2 (77.8/90.4% for CNVrd2, 36.7/4.8% for cn.mops and 7.2/1% for CNVnator at CCL3L1 and DEF103A). CNVrd2 is available as an R package as part of the Bioconductor project: http://www.bioconductor.org/packages/release/bioc/html/CNVrd2.html.

Highlights

  • Copy number variation (CNV) encompassing genes is a common phenomenon in the human genome, and has been shown to be associated with variation in phenotype (Gonzalez et al, 2005; Freeman et al, 2006; McKinney et al, 2008, 2010, 2012; Bentley et al, 2010)

  • CNVrd2 was designed to correct issues that can occur during the segmentation process which can result in CNVrd assigning a segmentation score of zero, resulting in some samples being assigned to unreasonable copy number (CN) groups

  • In CNVrd2, a linear regression model was used to adjust the differentiation in CN between populations

Read more

Summary

Introduction

Copy number variation (CNV) encompassing genes is a common phenomenon in the human genome, and has been shown to be associated with variation in phenotype (Gonzalez et al, 2005; Freeman et al, 2006; McKinney et al, 2008, 2010, 2012; Bentley et al, 2010). One population having clear clusters of segmentation scores was used to obtain prior information for all populations: the European and South Asian population for CCL3L1 and DEFB103A, respectively.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call