Abstract

Copy-number variation (CNV) has been associated with increased risk of complex diseases. High-throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structure as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximize information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92, 100, and 82%, respectively). The package is available at https://github.com/hoangtn/SRBreak, and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/.

Highlights

  • IntroductionCopy number variation (CNV) has been associated with increased risk of complex diseases such as austim, HIV, Crohn’s disease, rheumatoid arthritis, epilepsy, bipolar disorder, Alzheimer’s disease, and obesity (Gonzalez et al, 2005; McCarroll et al, 2008; Bentley et al, 2009; McKinney et al, 2010; Chung et al, 2014; Falchi et al, 2014; Hooli et al, 2014; Olson et al, 2014; Green et al, 2016)

  • concordance rate (CR) = Number of structural variations (SVs) called by a pipeline/Number of true SVs

  • To calculate true positive rate (TPR) and false discovery rates (FDRs), we focused on results flanking the simulated copy-number variable regions (CNVRs)

Read more

Summary

Introduction

Copy number variation (CNV) has been associated with increased risk of complex diseases such as austim, HIV, Crohn’s disease, rheumatoid arthritis, epilepsy, bipolar disorder, Alzheimer’s disease, and obesity (Gonzalez et al, 2005; McCarroll et al, 2008; Bentley et al, 2009; McKinney et al, 2010; Chung et al, 2014; Falchi et al, 2014; Hooli et al, 2014; Olson et al, 2014; Green et al, 2016). SRBreak: Split-Read Breakpoint Detection addition, CNV at the CCL3L1 locus has been associated with selective adaptation (Gonzalez et al, 2005; Perry et al, 2007; Hardwick et al, 2011, 2014) Such CNV-disease relationships, are difficult to detect and replicate for a number of reasons (He et al, 2009; Shrestha et al, 2010; Carpenter et al, 2011; Nordang et al, 2012; Aklillu et al, 2013). Precise identification of the breakpoints of duplication or deletion events could enhance our understanding of the exact structure of regions carrying the CN variants, and the subsequent functional impact on biological pathways These exact breakpoints would be amenable to direct genotyping for surrogate measurement of CNV

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.