SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions.

Hoang T Nguyen,Michael A Black,James Boocock,Tony R Merriman

doi:10.3389/fgene.2016.00160

Abstract

Copy-number variation (CNV) has been associated with increased risk of complex diseases. High-throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structure as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximize information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92, 100, and 82%, respectively). The package is available at https://github.com/hoangtn/SRBreak, and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/.

Highlights

IntroductionCopy number variation (CNV) has been associated with increased risk of complex diseases such as austim, HIV, Crohn’s disease, rheumatoid arthritis, epilepsy, bipolar disorder, Alzheimer’s disease, and obesity (Gonzalez et al, 2005; McCarroll et al, 2008; Bentley et al, 2009; McKinney et al, 2010; Chung et al, 2014; Falchi et al, 2014; Hooli et al, 2014; Olson et al, 2014; Green et al, 2016)
concordance rate (CR) = Number of structural variations (SVs) called by a pipeline/Number of true SVs
To calculate true positive rate (TPR) and false discovery rates (FDRs), we focused on results flanking the simulated copy-number variable regions (CNVRs)

Summary

Introduction

Copy number variation (CNV) has been associated with increased risk of complex diseases such as austim, HIV, Crohn’s disease, rheumatoid arthritis, epilepsy, bipolar disorder, Alzheimer’s disease, and obesity (Gonzalez et al, 2005; McCarroll et al, 2008; Bentley et al, 2009; McKinney et al, 2010; Chung et al, 2014; Falchi et al, 2014; Hooli et al, 2014; Olson et al, 2014; Green et al, 2016). SRBreak: Split-Read Breakpoint Detection addition, CNV at the CCL3L1 locus has been associated with selective adaptation (Gonzalez et al, 2005; Perry et al, 2007; Hardwick et al, 2011, 2014) Such CNV-disease relationships, are difficult to detect and replicate for a number of reasons (He et al, 2009; Shrestha et al, 2010; Carpenter et al, 2011; Nordang et al, 2012; Aklillu et al, 2013). Precise identification of the breakpoints of duplication or deletion events could enhance our understanding of the exact structure of regions carrying the CN variants, and the subsequent functional impact on biological pathways These exact breakpoints would be amenable to direct genotyping for surrogate measurement of CNV

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in genetics	Publication Date: Sep 15, 2016
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics

Lead the way for us

Similar Papers

Copy number variation in human genomes from three major ethno-linguistic groups in Africa
Oscar A Nyangiri ... Enock Matovu
BMC Genomics | VOL. 21
Oscar A Nyangiri, et. al.Oscar A Nyangiri ... Enock Matovu
10 Apr 2020
BMC Genomics | VOL. 21

Inferring Copy Number Variation Networks From The Qatari Genome
Noha A Yousri ... Ronald G Crystal
-
Noha A Yousri, et. al.Noha A Yousri ... Ronald G Crystal
01 Jan 2014
01 Jan 2014

Genome-wide detection of CNV regions and their potential association with growth and fatness traits in Duroc pigs
Yibin Qiu ... Yong Ye
BMC Genomics | VOL. 22
Yibin Qiu, et. al.Yibin Qiu ... Yong Ye
08 May 2021
BMC Genomics | VOL. 22

Global patterns of large copy number variations in the human genome reveal complexity in chromosome organization.
Avinash M Veerappa ... Prakash Padakannaya
Genetics Research | VOL. 97
Avinash M Veerappa, et. al.Avinash M Veerappa ... Prakash Padakannaya
01 Jan 2015
Genetics Research | VOL. 97

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics