Abstract

BackgroundStructural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. ResultsWe developed and validated SoftSearch using real and synthetic datasets. SoftSearch’s key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. ConclusionsWe show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.

Highlights

  • Many patients at high-risk for developing cancer have a negative finding from mutation screening [1]

  • A recent study has suggested that the frequency of SV in BRCA1 and BRCA2 genes could comprise as high as 18% of all BRCA mutations [5], and many of these are likely causative of cancer susceptibility in the families in whom they were identified [6], and is recommended to be used in clinical practice [6]

  • The significantly higher recall obtained with SoftSearch on the 4x coverage HapMap highlights that the SoftSearch strategy of combining multiple sequence features to call breakpoint is more effective than relying only on a single feature

Read more

Summary

Introduction

Many patients at high-risk for developing cancer have a negative finding from mutation screening [1]. Women with a family history of breast or ovarian cancer with point mutations in the BRCA1 and BRCA2 genes are clinically recognized to have a high risk of developing breast cancer. The process of SV discovery in disease genes like BRCA1 and BRCA2 required gene-specific probes to amplify and quantify the genomic DNA structure and amount, which made it difficult to identify new genes contributing to breast cancer risk though mechanisms such as SV. SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call