Abstract

PurposeClinical genome sequencing (cGS) followed by orthogonal confirmatory testing is standard practice. While orthogonal testing significantly improves specificity, it also results in increased turnaround time and cost of testing. The purpose of this study is to evaluate machine learning models trained to identify false positive variants in cGS data to reduce the need for orthogonal testing. MethodsWe sequenced five reference human genome samples characterized by the Genome in a Bottle Consortium (GIAB) and compared the results with an established set of variants for each genome referred to as a truth set. We then trained machine learning models to identify variants that were labeled as false positives. ResultsAfter training, the models identified 99.5% of the false positive heterozygous single-nucleotide variants (SNVs) and heterozygous insertions/deletions variants (indels) while reducing confirmatory testing of nonactionable, nonprimary SNVs by 85% and indels by 75%. Employing the algorithm in clinical practice reduced overall orthogonal testing using dideoxynucleotide (Sanger) sequencing by 71%. ConclusionOur results indicate that a low false positive call rate can be maintained while significantly reducing the need for confirmatory testing. The framework that generated our models and results is publicly available at https://github.com/HudsonAlpha/STEVE.

Highlights

  • Genetics and Genomics (ACMG) and the College of American Pathologists (CAP) recommend orthogonal confirmation (e.g., Sanger sequencing) for reported variants to reduce the risk of false positive results.[3,4]

  • The number of variants called across all samples was greater than 24 million true positive calls with 137 thousand false positive calls using the Dragen pipeline

  • Details of these counts by sample, variant type, and genotype along with a detailed description of the pipeline and Real Time Genomic (RTG) vcfeval invocations is available in the Supplemental Material

Read more

Summary

Introduction

Clinical next-generation sequencing (NGS) is widely used to identify a molecular diagnosis in patients with suspected genetic disorders.[1,2] NGS pipelines are known to have both random and systematic errors at sequencing, alignment, and variant calling steps of the pipeline.[3,4] Because the reported variants can impact patient care, the American College of MedicalGenetics and Genomics (ACMG) and the College of American Pathologists (CAP) recommend orthogonal confirmation (e.g., Sanger sequencing) for reported variants to reduce the risk of false positive results.[3,4] orthogonal confirmation increases both the cost and turnaround time of the NGS test.the total number of variants that are candidates for clinical reporting is steadily increasing, as demonstrated by the growth in public databases such as ClinVar and OMIM.[5,6] Orthogonal confirmation of all reported variants will cause the effective cost of NGS to steadily increase due to an increase in the number of variants sent for confirmation. Clinical next-generation sequencing (NGS) is widely used to identify a molecular diagnosis in patients with suspected genetic disorders.[1,2] NGS pipelines are known to have both random and systematic errors at sequencing, alignment, and variant calling steps of the pipeline.[3,4] Because the reported variants can impact patient care, the American College of Medical. Genetics and Genomics (ACMG) and the College of American Pathologists (CAP) recommend orthogonal confirmation (e.g., Sanger sequencing) for reported variants to reduce the risk of false positive results.[3,4] orthogonal confirmation increases both the cost and turnaround time of the NGS test.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call