Abstract

Copy number variations (CNVs) are associated with many complex diseases. Next generation sequencing data enable one to identify precise CNV breakpoints to better under the underlying molecular mechanisms and to design more efficient assays. Using the CIGAR strings of the reads, we develop a method that can identify the exact CNV breakpoints, and in cases when the breakpoints are in a repeated region, the method reports a range where the breakpoints can slide. Our method identifies the breakpoints of a CNV using both the positions and CIGAR strings of the reads that cover breakpoints of a CNV. A read with a long soft clipped part (denoted as S in CIGAR) at its 3′(right) end can be used to identify the 5′(left)-side of the breakpoints, and a read with a long S part at the 5′ end can be used to identify the breakpoint at the 3′-side. To ensure both types of reads cover the same CNV, we require the overlapped common string to include both of the soft clipped parts. When a CNV starts and ends in the same repeated regions, its breakpoints are not unique, in which case our method reports the left most positions for the breakpoints and a range within which the breakpoints can be incremented without changing the variant sequence. We have implemented the methods in a C++ package intended for the current Illumina Miseq and Hiseq platforms for both whole genome and exon-sequencing. Our simulation studies have shown that our method compares favorably with other similar methods in terms of true discovery rate, false positive rate and breakpoint accuracy. Our results from a real application have shown that the detected CNVs are consistent with zygosity and read depth information. The software package is available at http://statgene.med.upenn.edu/softprog.html.

Highlights

  • Copy number variation (CNV) is a type of genomic structural variation where a segment of chromosome is duplicated, deleted or inserted, has an unusual number of copies (Freeman et al, 2006) of DNAs

  • SIMULATION COMPARISONS To demonstrate the efficiency and limitation of our method, we evaluated the performance of MATCHCLIP based on simulated sequence reads that incorporated the CNVs published by the 1000 Genomes Projects (Mills et al, 2011)

  • We assert the two reads originate from the same CNV’s junction region by requiring the two reads overlap in a polarized way with the type MS read on the left and the type SM read on the right

Read more

Summary

Introduction

Copy number variation (CNV) is a type of genomic structural variation where a segment of chromosome is duplicated, deleted or inserted, has an unusual number of copies (Freeman et al, 2006) of DNAs. Read depth-based methods often assume uniform fragmentation of the chromosomes and paired-end-based methods assume effective size selection These two kinds of methods are very powerful in detecting the existence of CNVs but not precise in terms of the exact start and end locations. To accurately locate the breakpoints down to single base resolution, knowledge of the sequence in the vicinity of the CNV on the variant allele is required. This can be obtained by local assembly of the short reads into a consensus sequence (Alkan et al, 2011) followed by subsequent comparison with the reference, or looking for reads that span the breakpoints. The split read methods are based on the fact that the reads that cover the CNV breakpoints are split when mapped back to the reference genome sequences

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.