Abstract

Whole genome sequencing (WGS) remains prohibitively expensive, which has encouraged the development of methods to impute WGS data into nonsequenced individuals using a framework of single nucleotide polymorphisms genotyped for genome-wide association studies (GWAS). Although successful methods have been developed for cohorts of unrelated individuals, current imputation methods in related individuals are limited by pedigree size, by the distance of relationships, or by computation time. In this article, we describe a method for imputation in arbitrarily shaped multigenerational pedigrees that can impute genotypes across distantly related individuals based on identity by descent. We evaluate this approach using GWAS data and apply this approach to WGS data distributed for Genetic Analysis Workshop 18.

Highlights

  • Recent years have seen a sharp increase in the throughput of genotyping in human cohorts represented largely by two technologies: microarray-based approaches that genotype hundreds of thousands of markers tagging common haplotype blocks for the purpose of genomewide association studies (GWAS) and Whole genome sequencing (WGS), which allows near comprehensive discovery of genotypes

  • To assess the robustness of this approach to the availability of set 2 data, we repeated this masking experiment, this time randomly masking genotypes using a randomly generated probability ranging from 0 to 1 for each marker. This test showed that the accuracy of the described imputation approach is robust across levels of available data

  • We have developed a method for imputing WGS data in arbitrarily shaped pedigrees using a GWAS framework

Read more

Summary

Introduction

Recent years have seen a sharp increase in the throughput of genotyping in human cohorts represented largely by two technologies: microarray-based approaches that genotype hundreds of thousands of markers tagging common haplotype blocks for the purpose of genomewide association studies (GWAS) and WGS, which allows near comprehensive discovery of genotypes. Merlin [2] and Mendel [3] are two programs that have built-in options for genotype imputation designed for working with pedigrees. Markov chain Monte Carlo (MCMC) methods that sample across potential identity-by-descent (IBD) states are an approach that makes broader use of the information contained between more distant relatives but are still currently being developed as an imputation approach [4]. This approach requires iterative sampling that is computationally expensive and may limit its utility in laboratories with limited computing resources. An ideal imputation approach would be highly accurate and computationally fast and would consider all chromosomal segments that are identical by descent within the pedigree

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call