Extension of multifactor dimensionality reduction for identifying multilocus effects in the GAW14 simulated data

Hao Mei,Allison Ashley-Koch,Eden R Martin,Deqiong Ma

doi:10.1186/1471-2156-6-s1-s145

Abstract

The multifactor dimensionality reduction (MDR) is a model-free approach that can identify gene × gene or gene × environment effects in a case-control study. Here we explore several modifications of the MDR method. We extended MDR to provide model selection without crossvalidation, and use a chi-square statistic as an alternative to prediction error (PE). We also modified the permutation test to provide different levels of stringency. The extended MDR (EMDR) includes three permutation tests (fixed, non-fixed, and omnibus) to obtain p-values of multilocus models. The goal of this study was to compare the different approaches implemented in the EMDR method and evaluate the ability to identify genetic effects in the Genetic Analysis Workshop 14 simulated data. We used three replicates from the simulated family data, generating matched pairs from family triads. The results showed: 1) chi-square and PE statistics give nearly consistent results; 2) results of EMDR without cross-validation matched that of EMDR with 10-fold cross-validation; 3) the fixed permutation test reports false-positive results in data from loci unrelated to the disease, but the non-fixed and omnibus permutation tests perform well in preventing false positives, with the omnibus test being the most conservative. We conclude that the non-cross-validation test can provide accurate results with the advantage of high efficiency compared to 10-cross-validation, and the non-fixed permutation test provides a good compromise between power and false-positive rate.

Highlights

Gene × gene and gene × environment interactions undoubtedly play an important role in risk of complex diseases
Dataset The dataset used for validation of the EMDR was the simulated Genetic Analysis Workshop 14 (GAW14) data of Kofendrerd Personality Disorder (KPD)
The best 3-locus model included markers near both D1 and D4. This effect was identified by all permutation tests except non-fixed permutation test with 10-fold cross-validation

Summary

Introduction

Gene × gene and gene × environment interactions undoubtedly play an important role in risk of complex diseases. Though classic statistical methods (e.g., logistical regression) are commonly used, as the number of possible interactions increases, the number of interaction terms grows exponentially with the addition of the main effect of each gene, leading to overparameterization and low power in models with highdimensionality [1]. To address this concern, the multifactor dimensionality reduction (MDR) was developed to identify interactions among multiple factors, which together influence disease susceptibility [2]. By applying the technique of n-1 crossvalidation (keeping n-1 groups for training and leaving out (page number not for citation purposes)

Objectives

Methods

Results

Conclusion