Abstract

In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus model is unsatisfactory to handle complex confounding and causes loss of statistical power. To address these issues, we propose an efficient two-stage method based on hybrid of restricted and penalized maximum likelihood, named HRePML. Firstly, we performed restricted maximum likelihood (REML) on single-locus LMM to remove unrelated markers, where spectral decomposition on covariance matrix was used to fast estimate variance components. Secondly, we carried out penalized maximum likelihood (PML) on multi-locus LMM for markers with reasonably large effects. To validate the effectiveness of HRePML, we conducted a series of simulation studies and real data analyses. As a result, our method always had the highest average statistical power compared with multi-locus mixed-model (MLMM), fixed and random model circulating probability unification (FarmCPU), and genome-wide efficient mixed model association (GEMMA). More importantly, HRePML can provide higher accuracy estimation of marker effects. HRePML also identifies 41 previous reported genes associated with development traits in Arabidopsis, which is more than was detected by the other methods.

Highlights

  • Genome-wide association studies (GWAS) can advance our understanding of molecular mechanism of complex traits [1,2,3,4]

  • In order to validate the effectiveness of HRePML, we conducted a series of simulation studies and real data analyses and compared it with three methods: multi-locus mixed-model (MLMM) [26], fixed and random model circulating probability unification (FarmCPU) [27], and genome-wide efficient mixed model association (GEMMA) [9]

  • We compared the statistical properties of the new HRePML method with those of the multi-locus mixed-model (MLMM) [26], fixed and random model circulating probability unification (FarmCPU) [27], and genome-wide efficient mixed model association (GEMMA) [9] methods

Read more

Summary

Introduction

Genome-wide association studies (GWAS) can advance our understanding of molecular mechanism of complex traits [1,2,3,4]. Testing each SNP (single nucleotide polymorphism) one time is the most popular method, which is flexible to perform on all kinds of models. Each SNP requires multiple testing adjustment, which will result in strict p-values. Xu, et al [5] proposed a model-based clustering method that borrowed information across SNPs and increased the signal strength by properly clustering. SNPs. Lee and Lee [6] presented a web application for the network-based Arabidopsis genome-wide association boosting, which can identify weak association signals by integrating co-functional gene network information. Apart from this, the linear mixed model (LMM) has become a widely used methodology due to its capability in controlling for population stratification and the inclusion of related individuals [7].

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call