Abstract

Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.

Highlights

  • Structural variation (SV) is a major source of phenotypic diversity [1,2,3,4] and human disease [5,6,7]

  • Reads from all individuals included in joint calling that are soft or hard clipped are hashed by position to identify shared breakpoints

  • The soft-clipped sequences that overhang the breakpoint are collapsed into a consensus sequence using a multiple sequence alignment (MSA) provided in the seqAn library [27]

Read more

Summary

Introduction

Structural variation (SV) is a major source of phenotypic diversity [1,2,3,4] and human disease [5,6,7]. Detecting SVs in short-read sequence data is challenging [8]. Using SVs in association studies remains problematic, primarily due to three technical difficulties. SV callers suffer from both high false positive and false negative rates [5]. The breakpoints of SVs are highly variable, making it difficult to detect an association between a phenotype and a complex ensemble of overlapping SVs [9]. To our knowledge, no existing structural variant detection software can identify SV enrichment in cases vs controls within a framework amenable to high-throughput sequence analysis. Wham (Whole-genome Alignment Metrics) effectively addresses these problems

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call