Abstract

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.

Highlights

  • Copy-number variants (CNVs) are a major form of genetic variation in mammals [1,2,3,4] and a risk factor for various human diseases [5,6,7,8,9,10,11]

  • We have developed an integrated and novel method (ASGENSENG) that exploits the rich information in both total (TReC) and allele-specific read-depth (ASReC) to detect CNVs and ASCNVs from both whole-genome sequence (WGS) and whole-exome sequence (WES) data

  • Analogous to the previous success with array-based CNV calling, we have demonstrated that joint analysis of Total Read Count (TReC) and Allele-Specific Read Count (ASReC) allows the estimation of allele-specific copy number (ASCN) and improves the estimation of total copy number (e.g. 1 copy deletion, 3 copy duplications)

Read more

Summary

Introduction

Copy-number variants (CNVs) are a major form of genetic variation in mammals [1,2,3,4] and a risk factor for various human diseases [5,6,7,8,9,10,11]. CNV assessment is important in functional genomic studies since failing to account for copy-number differences can result in misinterpretation of data from RNA-seq, chromatin immunoprecipitation (ChIP-seq), DNase-hypersensitive site mapping (DNase-seq) or formaldehyde-assisted isolation of regulatory elements (FAIRE-seq) [16,17]. For these reasons, accurate detection of CNVs is of paramount importance; and allele-specific copy number (ASCN) calls are highly desirable as it is important to know how CNVs are allocated in diploid organisms [18,19]. Allele-specific CNV calls provide crucial additional information for disease studies

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.