Abstract

BackgroundStructural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers.ResultsIn this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset.ConclusionsOur results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.

Highlights

  • Structural variants (SVs) in human genomes are implicated in a variety of human diseases

  • Using data sets of NA12878, HX1 and the Ashkenazi Jewish (AJ) family trio, we evaluated the performance of four aligner/SV caller combinations (BLASR/PBHoney-Spots, BLASR/PBHoney-Tails, BWA/Sniffles and NGMLR/Sniffles) as well as NextSV under different PacBio coverages

  • Performance of SV calling on different coverages of the NA12878 genome To determine the optimal coverage for SV detection on PacBio data, we evaluated the performance of NextSV

Read more

Summary

Introduction

Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Biosciences (PacBio) produces long-read sequencing data, making it potentially well-suited for SV detection in personal genomes [10, 11]. Merker et al reported the application of low coverage whole genome PacBio sequencing to identify pathogenic structural variants from a patient with autosomal dominant Carney complex, for whom targeted clinical gene testing and whole genome short-read sequencing were both negative [12]. This represents a clear example that long-read sequencing may solve some negative cases in clinical diagnostic settings

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call