Abstract

There has been growing recognition of the vital links between structural variations (SVs) and diverse diseases. Research suggests that, with much longer DNA fragments and abundant contextual information, long-read technologies have advantages in SV detection even in complex repetitive regions. So far, several pipelines for calling SVs from long-read sequencing data have been proposed and used in human genome research. However, the performance of these pipelines is still lack of deep exploration and adequate comparison. In this study, we comprehensively evaluated the performance of three commonly used long-read SV detection pipelines, namely PBSV, Sniffles and PBHoney, especially the performance on detecting the SVs in tandem repeat regions (TRRs). Evaluated by using a robust benchmark for germline SV detection as the gold standard, we thoroughly estimated the precision, recall and F1 score of insertions and deletions detected by the pipelines. Our results revealed that all these pipelines clearly exhibited better performance outside TRRs than that in TRRs. The F1 scores of Sniffles in and outside TRRs were 0.60 and 0.76, respectively. The performance of PBSV was similar to that of Sniffles, and was generally higher than that of PBHoney. In conclusion, our findings can be benefit for choosing the appropriate pipelines in real practice and are good complementary to the application of long-read sequencing technologies in the research of rare diseases.

Highlights

  • Previous studies typically defined structural variations as genomic changes at least 50 base pairs in size

  • Using the benchmark established by the Genome in a Bottle (GIAB) Consortium Zook et al (2020) as the gold standard, we evaluated the precision, recall and F1 score of these pipelines

  • PBSV detected the largest number of SVs and Sniffles detected the least number of SVs

Read more

Summary

Introduction

Previous studies typically defined structural variations as genomic changes at least 50 base pairs (bp) in size. SVs are closely related to diverse human diseases Weischenfeldt et al (2013); Lupski, (2015), such as autism Pinto et al (2010); Sanders et al (2012); Chen et al (2017) and schizophrenia (Sebat et al, 2007; Stefansson et al, 2008; Walsh et al, 2008; Kirov et al, 2012). Compared with singlenucleotide variations (SNVs), SVs contain more nucleotides and are considered to be higher correlated with evolution, genetic diversity and disease-causing mutations (Stankiewicz and Lupski, 2010; Weischenfeldt et al, 2013; Abel et al, 2020).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call