Abstract

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.

Highlights

  • Genome-wide association study (GWAS) is a collection of successful methods for identifying genetic loci associated with complex traits

  • The new approach GGInt-XGBoost is proposed for identifying gene–gene interactions of complex phenotypes at the gene level in case-control studies by leveraging the eXtreme Gradient Boosting (XGBoost) (Chen and Guestrin, 2016), which is applied in co-expressed gene detection and to explore genetic associations in the field of bioinformatics (Jiang et al, 2013; Babajide Mustapha and Saeed, 2016; Liu et al, 2016; Liu and Jiang, 2016; Mrozek et al, 2016; Wei et al, 2017a; Wei et al, 2017b; Liu et al, 2017; Chen et al, 2018; Wei et al, 2018; Jiang et al, 2019; Liu et al, 2019; Yu et al, 2020a; Yu et al, 2020b; Lv et al, 2020; Li et al, 2021; Liu et al, 2021)

  • We created a statistic based on the XGBoost to quantify GGI intensity in order to see if there is a statistical interaction between two genes in a qualitative phenotype

Read more

Summary

INTRODUCTION

Genome-wide association study (GWAS) is a collection of successful methods for identifying genetic loci associated with complex traits. The new approach GGInt-XGBoost is proposed for identifying gene–gene interactions of complex phenotypes at the gene level in case-control studies by leveraging the eXtreme Gradient Boosting (XGBoost) (Chen and Guestrin, 2016), which is applied in co-expressed gene detection and to explore genetic associations in the field of bioinformatics (Jiang et al, 2013; Babajide Mustapha and Saeed, 2016; Liu et al, 2016; Liu and Jiang, 2016; Mrozek et al, 2016; Wei et al., 2017a; Wei et al, 2017b; Liu et al, 2017; Chen et al, 2018; Wei et al, 2018; Jiang et al, 2019; Liu et al, 2019; Yu et al, 2020a; Yu et al, 2020b; Lv et al, 2020; Li et al, 2021; Liu et al, 2021). Its application using real datasets showed accurate identification of gene–gene interactions

MATERIALS AND METHODS
GGInt-XGBoost
XGBoost With the Additive Constraint
Illustration of the GGInt-XGBoost Workflow
Simulation Study
Disease Model
Experiments Using Rheumatoid Arthritis Data
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call