Abstract

The identification of structural variants using short-read data remains challenging. Most approaches that use discordant paired-end sequences ignore non-trivial signatures presented by variants containing 3 breakpoints, such as those generated by various copy-paste and cut-paste mechanisms. This can result in lower precision and sensitivity in the identification of the more common structural variants such as deletions and duplications. We present SVXplorer, which uses a graph-based clustering approach streamlined by the integration of non-trivial signatures from discordant paired-end alignments, split-reads and read depth information to improve upon existing methods. We show that SVXplorer is more sensitive and precise compared to several existing approaches on multiple real and simulated datasets. SVXplorer is available for download at https://github.com/kunalkathuria/SVXplorer.

Highlights

  • Structural variants (SVs), which include regions of genomic imbalances called copy number variants (CNVs) and balanced rearrangements such as inversions, account for the majority of varying bases in the human genome [1]

  • We present SVXplorer, which uses a graph-based clustering approach streamlined by the integration of non-trivial signatures from discordant paired-end alignments, split-reads and read depth information to improve upon existing methods

  • Most methods to identify structural variant (SV) focus on deletions, duplications, and inversions which can be identified by the integration of information from coverage and insert length of aligned reads around the breakpoints

Read more

Summary

Introduction

Structural variants (SVs), which include regions of genomic imbalances called copy number variants (CNVs) and balanced rearrangements such as inversions, account for the majority of varying bases in the human genome [1]. Earlier methods were developed to harness evidence from one of these sources, but more recent methods such as LUMPY [6], TIDDIT [7], and TARDIS [8] integrate multiple sources and typically outperform earlier methods [5] Despite these developments, SV callers have varying accuracy for different classes of SVs, and most of them employ designed heuristics for the identification of specific SV types [9,10,11]. Most of them focus on the detection of signatures of individual SV types, often ignoring 3-breakpoint SVs and their signatures Ignoring those signatures often leads to incorrect identification or annotation of common SVs such as deletions, duplications, and inversions. Even if read-depth is used to discover the duplicated segment, such calls will be filtered away by the caller if it considers discordant read-pairs, but does not account for overlaps of the discordant pairs

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call