Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers

Mousheng Xu,Amy Damask,Kelan G Tantisira,Ann Wu,Blanca E Himes,Scott T Weiss,Augusto A Litonjua,Jen-Hwa Chu

doi:10.1186/1471-2350-12-90

Abstract

BackgroundPersonalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. To date, disease prediction has been based on a few environmental factors and/or single nucleotide polymorphisms (SNPs), while complex diseases are usually affected by many genetic and environmental factors with each factor contributing a small portion to the outcome. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics.MethodsIn this study, using emergency room visits or hospitalizations as the definition of a severe asthma exacerbation, we first identified a list of top Genome Wide Association Study (GWAS) SNPs ranked by Random Forests (RF) importance score for the CAMP (Childhood Asthma Management Program) population of 127 exacerbation cases and 290 non-exacerbation controls. We predict severe asthma exacerbations using the top 10 to 320 SNPs together with age, sex, pre-bronchodilator FEV1 percentage predicted, and treatment group.ResultsTesting in an independent set of the CAMP population shows that severe asthma exacerbations can be predicted with an Area Under the Curve (AUC) = 0.66 with 160-320 SNPs in comparison to an AUC score of 0.57 with 10 SNPs. Using the clinical traits alone yielded AUC score of 0.54, suggesting the phenotype is affected by genetic as well as environmental factors.ConclusionsOur study shows that a random forests algorithm can effectively extract and use the information contained in a small number of samples. Random forests, and other machine learning tools, can be used with GWAS studies to integrate large numbers of predictors simultaneously.

Highlights

Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history
The evidence for weak predictability in the Random single nucleotide polymorphisms (SNPs) control models likely comes from the clinical traits used in the models rather than the random SNPs. These results indicate that the Random Forests (RF) selected SNPs contain information about exacerbation, while the random SNPs do not
Depending on the portion of the ROC curve that is used, this can equate to a positive predictive value (PPV) of 0.81 and a negative predictive value (NPV) of 0.74 with proportion of exacerbators = 0.3 as shown in Table 1 and choosing a scoring threshold corresponding to sensitivity = 0.2 and specificity = 0.95, allowing for reasonable prediction of asthma exacerbations

Summary

Introduction

Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics. As a field, personalized medicine faces multiple issues when trying to predict complex diseases such as cardiovascular diseases, cancer, and asthma. This is largely due to the fact that no single genotypic or phenotypic characteristic can explain more than a small portion of any complex. An added benefit of Random Forests is that the decision trees naturally handle interactions among input variables

Methods

Results

Discussion

Conclusion