Effective variant filtering and expected candidate variant yield in studies of rare human disease

Brent S Pedersen,Harriet Dashnow,Martin Tristani-Firouzi,Amelia D Wallace,Rong Mao,Pinar Bayrak-Toydemir,Joshua D Schiffman,Aaron R Quinlan,D Hunter Best,Tatiana Tvrdik,Joe M Brown,Matt Velinder

doi:10.1038/s41525-021-00227-3

Abstract

In studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.

Highlights

Rare human diseases are often caused by de novo or inherited variants in a single protein-coding gene[1,2]
We varied allele balance (AB; i.e., the ratio of reads aligned at a variant locus that support the alternate allele) cutoffs before declaring a variant to be either a Mendelian violation or transmitted variant (Fig. 1)
We used the minimum AB between the parent and the child as the value that was filtered in creating the curve

Summary

Introduction

Rare human diseases are often caused by de novo or inherited variants in a single protein-coding gene[1,2]. Isolating the small subset of causal variants from the numerous inconsequential variants in cohort exome and genome datasets remains an analytical bottleneck. The decreasing cost of sequencing has resulted in a dramatic increase in the number of groups analyzing sequence data from rare disease families. Because of alignment and variant calling artifacts[3], careful filtering is required to extract an accurate set of causal variants. Each research group may choose custom strategies, use ad hoc software, or one of many tools designed to facilitate the filtering, including seqr (https:// seqr.broadinstitute.org/), GEMINI4, and genmod This leads to innumerable possible outcomes when analyzing the same cohort

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: npj Genomic Medicine	Publication Date: Jul 15, 2021
Citations: 65	License type: open-access

R Discovery Prime

R Discovery Prime

Effective variant filtering and expected candidate variant yield in studies of rare human disease

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: npj Genomic Medicine

Lead the way for us

Similar Papers

GenIO: a phenotype-genotype analysis web server for clinical genomics of rare diseases
Daniel Koile ... Patricio Yankilevich
BMC Bioinformatics | VOL. 19
Daniel Koile, et. al.Daniel Koile ... Patricio Yankilevich
27 Jan 2018
BMC Bioinformatics | VOL. 19

Multiple epiphyseal dysplasia tip 5: Case report a rare skeletal dysplasıa presenting with repetitive joint pain in children
Volkan Kizilkaya ... Alparslan Tonbul
International Journal of Surgery Case Reports | VOL. 106
Volkan Kizilkaya, et. al.Volkan Kizilkaya ... Alparslan Tonbul
12 Apr 2023
International Journal of Surgery Case Reports | VOL. 106

A research-based gene panel to investigate breast, ovarian and prostate cancer genetic risk
Madison R Bishop ... Nancy D Merner
PLOS ONE | VOL. 14
Madison R Bishop, et. al.Madison R Bishop ... Nancy D Merner
15 Aug 2019
PLOS ONE | VOL. 14

Genomic Landscape of Sporadic Retinitis Pigmentosa: Findings from 877 Spanish Cases
Inmaculada Martin-Merida ... Carmen Ayuso
Ophthalmology | VOL. 126
Inmaculada Martin-Merida, et. al.Inmaculada Martin-Merida ... Carmen Ayuso
20 Mar 2019
Ophthalmology | VOL. 126

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effective variant filtering and expected candidate variant yield in studies of rare human disease

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: npj Genomic Medicine