Abstract

Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing.

Highlights

  • DNA sequencing is being routinely carried out to identify genetic factors, rare variants in particular, associated with human disease

  • These lines of evidence show that family studies are emerging as a powerful approach in the sequencing era to localize genetic factors for human disease, and will play a key role as a complementary approach to the population based design to help understand the genetic basis of complex traits

  • Mendelian error is extremely rare for unphased genotypes, and is completely eliminated for haplotype calls. Through both simulations and real data, we show that Polymutt2 significantly outperforms other tools, including GATK, Beagle4 and Polymutt, for genotype calling on pedigree data, especially for rare variants in low coverage data

Read more

Summary

Introduction

DNA sequencing is being routinely carried out to identify genetic factors, rare variants in particular, associated with human disease. Recent studies demonstrated the effectiveness of sequencing families and identified associated rare variants for a variety of traits, including schizophrenia [9], Alzheimer [10], and hypertriglyceridemia [11]. These lines of evidence show that family studies are emerging as a powerful approach in the sequencing era to localize genetic factors for human disease, and will play a key role as a complementary approach to the population based design to help understand the genetic basis of complex traits

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.