Computational Generalization of Mixed Models on Large-Scale Data with Applications to Genetic Studies

Drinold A Mbete,Emile R Chimusa,Samson W Wanyonyi

doi:10.9734/ajpas/2018/v1i424550

Abstract

    Aims: To discuss different LMM-based approaches applied in GWAS and software packages implementation and Classify different computational tools that applies LMM approaches according to their applicability and performance. To identify possible SNPs associated to a particular disease using different computational tools based on LMM approaches. To estimate genetic and residual variance parameters that account phenotypic variation of the disease. Study Design: Case control study Place and Duration of Study: The research was carried out in Tanzania at African Institute of Mathematical Science for six months. Methodology: Linear Mixed Models (LMMs) are widely applied in genomic wide associations studies (GWAS) owing to their effectiveness of correcting hidden relationship, population structure and family structure. This essay is aimed at exploring different mathematical approaches of LMMs in GWAS. These approaches are linear mixed model with inclusion of all markers (LMMi) and linear mixed model with exclusion of all markers (LMMe) when calculating genetic relationship matrix. LMMi is more efficient as compared to LMMe when applied in studies of randomly ascertained quantitative traits. The LMM approaches are classified based on their applicability and performance. Two computational GWAS tools namely, PLINK and EMMAX were used which were based on LMM approaches to analyze unpublished real data from West Africa (Gambia and Ghana). Genetic and residual variance parameters were estimated that accounted for the phenotypic variation of the disease to be 0.0594 and 0.0723. A total of 338408 variants and 959 people (484 males, 405 females and 70 missing phenotypes) pass filters and quality control using PLINK was used in the study. Among the remaining phenotypes, 864 are cases and 95 are controls. The performance of different mathematical approaches of LMMs and their software implementation, including EMMAX and Plink via the application to a GWAS of tuberculosis (TB) in 959 individuals in West Africa (Ghana and Gambia) was compared. Of these 864 cases of TB and 95 healthy individuals retained after quality control (QC) using Plink, and 329601 autosome single nucleotide polymorphisms (from chromosome 1 to chromosome 22) included in the analysis after 288 duplicands ID individuals removed after QC. The LMM approaches are classified based on their applicability and performance. Two computational GWAS tools, namely Plink and EMMAX were used in the analysis of data. Genetic and residual variance parameters were estimated that accounted for the phenotypic variation of the disease to be 0.0594 and 0.0723. Results: Result showed that SNPs associated with tuberculosis were on chromosome 17 and SNP on chromosome 13 with both having false discovery rate with step up significance value. Plink failed to correct hidden relatedness. Although EMMAX reduced the false positive rate, it still exhibited very low presence of stratification. Conclusion: This study aimed at understanding and exploring different approaches of mixed models as applied in genetic studies. Overview of genetic variation, advantages, successes and application of mixed models and current challenges of mixed models in GWAS were discussed. Moreover, the study showed that SNPs was associated with a particular disease using computational tools that applies LMM approaches. The summary statistics from PLINK and EMMAX found two causal SNPs associated with the TB. These SNPs were rs7225581 on chromosome 17 and SNP rs4941412 on chromosome 13 with both having 0.69% FDR H. However, PLINK failed to correct hidden relatedness. This phenotypic variation showed that all common single nucleotide polymorphisms (SNPs) expressed approximately 18.52% of phenotypic variation of the disease.    

Full Text