Abstract

Genome-wide association studies (GWAS) play a vital role in identifying important genes those is associated with the phenotypic variations of living organisms. There are several statistical methods for GWAS including the linear mixed model (LMM) which is popular for addressing the challenges of hidden population stratification and polygenic effects. However, most of these methods including LMM are sensitive to phenotypic outliers that may lead the misleading results. To overcome this problem, in this paper, we proposed a way to robustify the LMM approach for reducing the influence of outlying observations using the β-divergence method. The performance of the proposed method was investigated using both synthetic and real data analysis. Simulation results showed that the proposed method performs better than both linear regression model (LRM) and LMM approaches in terms of powers and false discovery rates in presence of phenotypic outliers. On the other hand, the proposed method performed almost similar to LMM approach but much better than LRM approach in absence of outliers. In the case of real data analysis, our proposed method identified 11 SNPs that are significantly associated with the rice flowering time. Among the identified candidate SNPs, some were involved in seed development and flowering time pathways, and some were connected with flower and other developmental processes. These identified candidate SNPs could assist rice breeding programs effectively. Thus, our findings highlighted the importance of robust GWAS in identifying candidate genes.

Highlights

  • Laboratory, Infectious Diseases Division, International Centre for Diarrheal Disease Research (Icddr,b), Rajshahi, Bangladesh. 4Agricultural Statistics and ICT Division, Bangladesh Agricultural Research Institute (BARI), Gazipur 1701, Bangladesh. 5These authors contributed : Zobaer Akond and Md

  • We investigated the performance of the proposed method compare to two popular approaches (LMM and linear regression model (LRM)) using both synthetic and real data analysis as discussed below: Results and discussion based on a complete simulation

  • To investigate the performance of SNPs detections with the synthetic datasets, we considered two original clean simulated datasets that were generated with heritabilities 0.2 and 0.3 respectively, as described in the materials and method section

Read more

Summary

Introduction

There are, several statistical methodologies proposed earlier for GWAS to address the effects of population stratification. None of the methods mentioned above can handle the influence of the polygenic effect To overcome these issues, the linear mixed model (LMM) was proposed which is one of the most popular approaches in GWAS. All the methods as early discussed are very much sensitive to phenotypic outliers They can produce misleading results in presence of outlying observations. To overcome these issues, an attempt is made to robustify the LMM based GWAS by using a new type of outlier modification rule based on the minimum β-divergence ­method[20,21]. The performance of the proposed approach has been investigated using both simulated and real rice genome datasets related to flowering time

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call