Abstract

The recent advent of third-generation sequencing technologies brings promise for better characterization of genomic structural variants by virtue of having longer reads. However, long-read applications are still constrained by their high sequencing error rates and low sequencing throughput. Here, we present NanoVar, an optimized structural variant caller utilizing low-depth (8X) whole-genome sequencing data generated by Oxford Nanopore Technologies. NanoVar exhibits higher structural variant calling accuracy when benchmarked against current tools using low-depth simulated datasets. In patient samples, we successfully validate structural variants characterized by NanoVar and uncover normal alternative sequences or alleles which are present in healthy individuals.

Highlights

  • Structural variations are implicated in the development of many human diseases [1, 2] and account for most of the genetic variations by means of nucleotides in the human population [3, 4]

  • The NanoVar workflow The NanoVar workflow is a series of processes that utilizes 3GS long reads to discover and characterize Structural variants (SVs) in DNA samples

  • Based on our initial tests performed in simulated datasets, we recommend having at least 12–24 Gb of sequencing data, which can be achieved through one to ten MinION runs depending on the flowcell chemistry (R9.4, R9.5), library preparation kit (1D, 1D2, 2D), and DNA sample quality

Read more

Summary

Introduction

Structural variations are implicated in the development of many human diseases [1, 2] and account for most of the genetic variations by means of nucleotides in the human population [3, 4]. Structural variants (SVs), defined as genomic alterations greater than 50 base pairs (bp) [5], can functionally affect cellular physiology by forming genetic lesions which may lead to gene dysregulation or novel gene fusions, driving the development of diseases such as cancer [6, 7], Mendelian disorders [8, 9], and complex diseases [10]. Disease-associated SVs. There are currently two main standards of sequencing-based methods for comprehensive SV detection: long-read or third-generation sequencing (3GS) and short-read or second-generation sequencing (2GS). While 3GS is currently mainly restricted to the study of small genomes [15] or targeted sequencing [16], recent studies have reported mammalian whole-genome sequencing (WGS) [17, 18] but at a higher sequencing cost per megabase as compared to

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call