Comparison of artificial neural network analysis with other multimarker methods for detecting genetic association

David Curtis

doi:10.1186/1471-2156-8-49

Abstract

BackgroundDebate remains as to the optimal method for utilising genotype data obtained from multiple markers in case-control association studies. I and colleagues have previously described a method of association analysis using artificial neural networks (ANNs), whose performance compared favourably to single-marker methods. Here, the perfomance of ANN analysis is compared with other multi-marker methods, comprising different haplotype-based analyses and locus-based analyses.ResultsOf several methods studied and applied to simulated SNP datasets, heterogeneity testing of estimated haplotype frequencies using asymptotic p values rather than permutation testing had the lowest power of the methods studied and ANN analysis had the highest power. The difference in power to detect association between these two methods was statistically significant (p = 0.001) but other comparisons between methods were not significant. The raw t statistic obtained from ANN analysis correlated highly with the empirical statistical significance obtained from permutation testing of the ANN results and with the p value obtained from the heterogeneity test.ConclusionAlthough ANN analysis was more powerful than the standard haplotype-based test it is unlikely to be taken up widely. The permutation testing necessary to obtain a valid p value makes it slow to perform and it is not underpinned by a theoretical model relating marker genotypes to disease phenotype. Nevertheless, the superior performance of this method does imply that the widely-used haplotype-based methods for detecting association with multiple markers are not optimal and efforts could be made to improve upon them. The fact that the t statistic obtained from ANN analysis is highly correlated with the statistical significance does suggest a possibility to use ANN analysis in situations where large numbers of markers have been genotyped, since the t value could be used as a proxy for the p value in preliminary analyses.

Highlights

Debate remains as to the optimal method for utilising genotype data obtained from multiple markers in case-control association studies
When the asymptotic p value is used for the haplotype analysis, as in practice would usually be the case for an initial screen, artificial neural network (ANN) analysis has a power advantage which would have practical implications in the real world
The results do suggest that there is room to develop new methods which might share the advantages of ANN analysis in terms of implementing a parsimonious approach to detect the patterns of multi-marker genotypes which can be observed when an associated susceptibility locus is present

Summary

Introduction

Debate remains as to the optimal method for utilising genotype data obtained from multiple markers in case-control association studies. One will use a likelihood ratio test for heterogeneity of haplotype frequencies and this will have a number of degrees of freedom equal to one less than the number of haplotypes estimated to be present, typically 2m if there are m biallelic markers [4]. Rather than test for heterogeneity of haplotype frequencies between cases and controls one may seek to model the effects of haplotypes on risk of affection This different, albeit related, approach is implemented in the UNPHASED program and involves estimating haplotype frequencies and carrying out logistic regression analysis with the individual haplotypes modelled to confer different risks of affection [5,6]

Methods

Results

Discussion

Conclusion