Abstract

Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels.Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction. This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies. Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations.

Highlights

  • The analysis of genetic association studies for complex diseases requires the identification of significant single marker and interaction signals among a vast background of noise

  • Our study investigated the extent to which logistic regression (LR) single-locus and pairwise interaction effects correspond to random forest (RF) singleand joint-importance measures on a small data set of 20 markers (Genetic Analysis Workshop 15 (GAW15) rheumatoid arthritis (RA) candidate gene data set)

  • Our study explored the comparability of parametric (LR) with nonparametric (RFs) analysis techniques when investigating the presence of single-locus and pairwise effects between 20 markers of the North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set

Read more

Summary

Introduction

The analysis of genetic association studies for complex diseases requires the identification of significant single marker and interaction signals among a vast background of noise. One rare single-nucleotide polymorphism (SNP), HugotSNP8ms (minor allele frequency

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call