Abstract

MotivationStructural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities.ResultsWe present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines.Availability and implementationThe source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • A typical human genome differs from the reference genome at $4–5 million sites amounting to $20 million altered bases (1000 Genomes Project Consortium, 2015)

  • These variations can be categorized into single-nucleotide polymorphisms (SNPs), small insertions and deletions (Indels) and structural variation (SV) affecting a larger number of base pairs

  • Our results demonstrate that SVIM reaches substantially higher recall and precision than existing tools for SV detection from long reads

Read more

Summary

Introduction

A typical human genome differs from the reference genome at $4–5 million sites amounting to $20 million altered bases (1000 Genomes Project Consortium, 2015). These variations can be categorized into single-nucleotide polymorphisms (SNPs), small insertions and deletions (Indels) and structural variation (SV) affecting a larger number of base pairs. Studies have shown that in human more base pairs are altered due to SV than due to SNPs (Redon et al, 2006; Weischenfeldt et al, 2013). SVs have a major influence on human diversity and are implicated in a wide range of diseases from autism and other neurological diseases to cancer and obesity (Sebat et al, 2007; Weischenfeldt et al, 2013). The characterization of SVs is of major importance to human medicine and genetics alike

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call