Abstract

Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E−04–1E−03) and rare variants (1E−03–5E−03) (p < 1E−04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E−04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.

Highlights

  • Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, they result in loss of many single nucleotide polymorphisms (SNPs)

  • Rare variants are difficult to investigate; in many of these studies, the SNPs of individuals are routinely removed prior to ­imputation[7,8], which can lead to a loss of information or loss of accuracy when imputing the unaccounted for SNPs that may be in linkage disequilibrium (LD) with S­ NVs38

  • The MAF was determined for all variants and compared with that of the NCBI gMAF dbSNPB137 which is based on 1089 individuals from the 1000 Genome project phase 1 (1000GP1)

Read more

Summary

Introduction

Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, they result in loss of many single nucleotide polymorphisms (SNPs). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E−04). Initial filtration of single nucleotide variants (SNVs) (pre-filtration) was considered necessary to warrant correct inference of SNPs during i­mputation[7,8]. This was mostly based on routine quality control (QC) applied in association studies and fine mapping. It has been shown that filtering out low quality SNVs rather than incorporating them with a low quality score ­weight[40,41], can decrease the power of locus-based approaches when the causal

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.