Abstract

BackgroundIdentifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling’s T2 statistic to evaluate interaction models, but it is well known that Hotelling’s T2 statistic is highly sensitive to heavily skewed distributions and outliers.ResultsWe propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR.ConclusionsIntensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.

Highlights

  • Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases

  • For Quantitative MDR (QMDR), the penetrance for continuous phenotypes was defined as a function of mean [15]

  • We considered four phenotypes related to kidney function: blood urea nitrogen (BUN), serum creatinine, urinary albumin levels, and urinary red blood cell (RBC) levels

Read more

Summary

Introduction

Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Analyzing a single locus is not enough to understand the pathophysiology of complex diseases and results in the so-called missing heritability problem To overcome this problem, several studies have sought to identify gene–gene interactions (GGIs) or gene-environmental interactions [4,5,6]. As a non-parametric model-free approach, multifactor dimensionality reduction (MDR) has been widely applied for detecting GGIs [5]. For binary phenotypes, such as those analyzed in case–control studies, MDR divides high-dimensional genotype combinations into a one-dimensional variable with two groups (high-risk and low-risk), according to whether the ratio of cases to controls exceeds a threshold. MDR has several advantages: i) the dimensions of the data are effectively reduced, ii) no specific genetic model is assumed, and iii) high-order interactions can be identified, even if there are no significant main effects [9, 10]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call