Abstract
The ideal genetic analysis of family data would include whole genome sequence on all family members. A strategy of combining sequence data from a subset of key individuals with inexpensive, genome-wide association study (GWAS) chip genotypes on all individuals to infer sequence level genotypes throughout the families has been suggested as a highly accurate alternative. This strategy was followed by the Genetic Analysis Workshop 18 data providers. We examined the quality of the imputation to identify potential consequences of this strategy by comparing discrepancies between GWAS genotype calls and imputed calls for the same variants. Overall, the inference and imputation process worked very well. However, we find that discrepancies occurred at an increased rate when imputation was used to infer missing data in sequenced individuals. Although this may be an artifact of this particular instantiation of these analytic methods, there may be general genetic or algorithmic reasons to avoid trying to fill in missing sequence data. This is especially true given the risk of false positives and reduction in power for family-based transmission tests when founders are incorrectly imputed as heterozygotes. Finally, we note a higher rate of discrepancies when unsequenced individuals are inferred using sequenced individuals from other pedigrees drawn from the same admixed population.
Highlights
The ideal genetic analysis of family data would include whole genome sequence data on all family members
Generation of the data by the Genetic Analysis Workshop 18 (GAW18) providers We will distinguish between two ways that missing data were “filled in” in the GAW18 data: filling in missing sequence data in the sequenced individuals will be referred to as “imputation,” and inferring sequence-level data for individuals who were only genotyped using a genome-wide association study (GWAS) chip will be referred to as “inference.” We understand the imputation and inference process followed by the GAW18 data providers to consist of the following steps: (a) the GWAS chip data were phased using MaCH [4], and a haplotype scaffolding for the families was created; (b) missing sequence data in the sequenced individuals were imputed using MaCH; (c) sequence haplotypes for the unsequenced individuals were
We first present the results for the high call rate single-nucleotide polymorphisms (SNPs) alone and compare these with the results found in the full comparison SNPs set
Summary
The ideal genetic analysis of family data would include whole genome sequence data on all family members. A procedure has been suggested to avoid having to sequence every individual [1]. This procedure uses dense sequence data on a subset of individuals and sparse, inexpensive, genome-wide association study (GWAS) chip genotypes on all individuals to infer sequence-level genotypes on the related, unsequenced individuals. The Genetic Analysis Workshop 18 (GAW18) data providers have followed these procedures as documented in [2]. We examine the quality of the imputation to identify potential consequences for this approach
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.