Abstract
Genotype imputation has the potential to assess human genetic variation at a lower cost than assaying the variants using laboratory techniques. The performance of imputation for rare variants has not been comprehensively studied. We utilized 8865 human samples with high depth resequencing data for the exons and flanking regions of 202 genes and Genome-Wide Association Study (GWAS) data to characterize the performance of genotype imputation for rare variants. We evaluated reference sets ranging from 100 to 3713 subjects for imputing into samples typed for the Affymetrix (500K and 6.0) and Illumina 550K GWAS panels. The proportion of variants that could be well imputed (true r2>0.7) with a reference panel of 3713 individuals was: 31% (Illumina 550K) or 25% (Affymetrix 500K) with MAF (Minor Allele Frequency) less than or equal 0.001, 48% or 35% with 0.001<MAF< = 0.005, 54% or 38% with 0.005<MAF< = 0.01, 78% or 57% with 0.01<MAF< = 0.05, and 97% or 86% with MAF>0.05. The performance for common SNPs (MAF>0.05) within exons and flanking regions is comparable to imputation of more uniformly distributed SNPs. The performance for rare SNPs (0.01<MAF< = 0.05) was much more dependent on the GWAS panel and the number of reference samples. These results suggest routine use of genotype imputation for extending the assessment of common variants identified in humans via targeted exon resequencing into additional samples with GWAS data, but imputation of very rare variants (MAF< = 0.005) will require reference panels with thousands of subjects.
Highlights
Imputation and analysis of untyped genetic variants provides a more comprehensive picture of genetic variation within a genomic region than analysis of only typed variants [1]
This evaluation focused on characterizing the performance of genotype imputation for reference panels of 100 to 3713 subjects and variants present in the exons and flanking regions of genes
As the DeepSeq Variant Set had high quality genotype calls derived from high depth sequence data for 8865 subjects, we were able to characterize genotype imputation performance for variants with minor allele frequencies less than 0.01
Summary
Imputation and analysis of untyped genetic variants provides a more comprehensive picture of genetic variation within a genomic region than analysis of only typed variants [1]. High depth sequence data for thousands of samples has resulted in high quality rare variant calls in minor allele frequency (MAF) ranges not seen with prior HapMap or 1000 Genomes efforts. These efforts focused on sequencing a smaller number of individuals and used technologies with lower confidence in very rare heterozygous calls. As sequencing studies focusing on sequencing the exomes of genes are in progress, characterizing the performance of imputation methods for variants in the exons and flanking regions, especially for variants with MAF less than or equal to 0.05, will provide a comprehensive picture of the use of imputation to extend these association studies into additional samples with GWAS data. No other study has provided a summary of the performance of genotype imputation for variants with minor allele frequencies less than 0.01
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.