VarSight: prioritizing clinically reported variants with binary classification algorithms

James M Holt,Manavalan Gajapathy,Fariba Shaterferdosian,Brandon Wilk,Julie A Anderson,Angelina E Uno-Antonison,Jacob M Kelly,Nadiya Sosonkina,Donna M Brown,Camille L Birch,Arthur Weborg,Jeremy M Harris,Melissa A Wilk,Elizabeth A Worthey,Alexander C Moss

doi:10.1186/s12859-019-3026-8

James M Holt, Manavalan Gajapathy + Show 13 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-019-3026-8

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundWhen applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient’s phenotypes. Typically, this is done through annotation, filtering, and then prioritization of variants for manual curation. However, prioritization of variants in rare disease patients remains a challenging task due to the high degree of variability in phenotype presentation and molecular source of disease. Thus, methods that can identify and/or prioritize variants to be clinically reported in the presence of such variability are of critical importance.MethodsWe tested the application of classification algorithms that ingest variant annotations along with phenotype information for predicting whether a variant will ultimately be clinically reported and returned to a patient. To test the classifiers, we performed a retrospective study on variants that were clinically reported to 237 patients in the Undiagnosed Diseases Network.ResultsWe treated the classifiers as variant prioritization systems and compared them to four variant prioritization algorithms and two single-measure controls. We showed that the trained classifiers outperformed all other tested methods with the best classifiers ranking 72% of all reported variants and 94% of reported pathogenic variants in the top 20.ConclusionsWe demonstrated how freely available binary classification algorithms can be used to prioritize variants even in the presence of real-world variability. Furthermore, these classifiers outperformed all other tested methods, suggesting that they may be well suited for working with real rare disease patient datasets.

Highlights

When applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient’s phenotypes
These technologies are applied clinically by following workflows consisting of blood draw, sequencing, alignment, variant calling, variant annotation, variant filtering, and variant prioritization [4, 5]
We focus on real patient data from the multi-site collaboration of the Undiagnosed Diseases Network (UDN) [1]

Summary

Introduction

When applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient’s phenotypes. This is done through annotation, filtering, and prioritization of variants for manual curation. Genome and exome sequencing are both currently being used as molecular diagnostic tools for patients with rare, undiagnosed diseases [1,2,3] These technologies are applied clinically by following workflows consisting of blood draw, sequencing, alignment, variant calling, variant annotation, variant filtering, and variant prioritization [4, 5]. These methods use a wide range of annotation sources including but not limited to population allele frequencies [12], conservation scores [13,14,15], haploinsufficiency scores [16, 17], deleteriousness scores [17, 18], transcript impact

Methods

Results

Conclusion