Abstract

BackgroundNext generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality.ResultsWe implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set.ConclusionsOur set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.

Highlights

  • Generation sequencing (NGS) technologies have improved the study of hereditary diseases

  • A general comparison between pipelines In order to evaluate the performance of three variant callers using targeted sequencing data, Isaac, Freebayes, and VarScan pipelines were implemented

  • Performance comparison in the number of variants identified per chromosome revealed that Freebayes and Isaac had a similar resolution, which contrasted with slightly higher values obtained by VarScan (Fig. 2c)

Read more

Summary

Introduction

Generation sequencing (NGS) technologies have improved the study of hereditary diseases. We used TruSight Cardio kit sequencing data of the reference genome NA12878 This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality. Next-Generation Sequencing (NGS) technologies and applications, including whole genome, whole exome, and targeted sequencing, have drastically improved the study of hereditary diseases. Pipelines can result in dramatic differences that affect medical decisions in clinical laboratories that are developing or relied on sequencing-based tests [9]. This may involve changes in the diagnosis, prognosis, and treatment of patients [20]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call