Abstract

BackgroundThe imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In this particular software there is also an uncertainty in choosing the model parameters. fastPHASE is based on haplotype clusters, which size should be set a priori. The parameter influences the results of imputation and downstream analysis.ResultsWe present a software toolkit imputeqc to assess the imputation quality and/or to choose the model parameters for imputation. We demonstrate the efficacy of toolkit for evaluation of imputations made with both fastPHASE and BEAGLE software for HapMap and 1000 Genomes data. The discordance of genotypes received correlated well in both methods. Using imputeqc, we also shown how to choose the optimal number of haplotype clusters and expectation-maximization cycles for fastPHASE program. The found number of haplotype clusters of 25 was further applied for hapFLK testing that revealed signatures of selection at LCT region on chromosome 2. We also demonstrated how to decrease the computational time in the case of hapFLK testing from 3 days to 20 h.ConclusionsThe toolkit is implemented as an R package imputeqc and command line scripts. The code is freely available at https://github.com/inzilico/imputeqcunder the MIT license.

Highlights

  • The imputation of genotypes increases the power of genome-wide association studies

  • Since the haplotype cluster model is often applied to the pool of populations, we demonstrated imputeqc for a dataset composed of genotypes from CEU, TSI, CHB and JPT populations of 1000 Genomes Project

  • We found the optimal number of haplotype clusters to be 25

Read more

Summary

Introduction

The imputation of genotypes increases the power of genome-wide association studies. Not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option In this particular software there is an uncertainty in choosing the model parameters. Results: We present a software toolkit imputeqc to assess the imputation quality and/or to choose the model parameters for imputation. We shown how to choose the optimal number of haplotype clusters and expectation-maximization cycles for fastPHASE program. Imputation is an in silico method that infers genotypes for undetermined or missed markers in study samples. It results both in harmonizing data sets and increasing the overall number of markers available for testing. The missing genotypes can be replaced for their estimates provided with imputation

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.