Abstract

BackgroundSingle nucleotide polymorphism (SNP) based association studies aim at identifying SNPs associated with phenotypes, for example, complex diseases. The associated SNPs may influence the disease risk individually (main effects) or behave jointly (epistatic interactions). For the analysis of high throughput data, the main difficulty is that the number of SNPs far exceeds the number of samples. This difficulty is amplified when identifying interactions.ResultsIn this paper, we propose an Adaptive Group Lasso (AGL) model for large-scale association studies. Our model enables us to analyze SNPs and their interactions simultaneously. We achieve this by introducing a sparsity constraint in our model based on the fact that only a small fraction of SNPs is disease-associated. In order to reduce the number of false positive findings, we develop an adaptive reweighting scheme to enhance sparsity. In addition, our method treats SNPs and their interactions as factors, and identifies them in a grouped manner. Thus, it is flexible to analyze various disease models, especially for interaction detection. However, due to the intensive computation when millions of interaction terms needs to be searched in the model fitting, our method needs to combined with some filtering methods when applied to genome-wide data for detecting interactions.ConclusionBy using a wide range of simulated datasets and a real dataset from WTCCC, we demonstrate the advantages of our method.

Highlights

  • Single nucleotide polymorphism (SNP) based association studies aim at identifying single nucleotide polymorphisms (SNPs) associated with phenotypes, for example, complex diseases

  • For the real case-control study, we use the rheumatoid arthritis (RA) data set from the Wellcome Trust Case Control Consortium (WTCCC)

  • We find that the interaction of the SNP pair is very weak using the standard c2 test based on logistic regression models with df = 4, while the interaction of the SNP pair is strong

Read more

Summary

Introduction

Single nucleotide polymorphism (SNP) based association studies aim at identifying SNPs associated with phenotypes, for example, complex diseases. The associated SNPs may influence the disease risk individually (main effects) or behave jointly (epistatic interactions). For the analysis of high throughput data, the main difficulty is that the number of SNPs far exceeds the number of samples This difficulty is amplified when identifying interactions. In genome-wide association (GWA) studies of complex diseases, a few thousands samples are collected and hundreds of thousands of single nucleotide polymorphisms (SNPs) have been genotyped for each sample [1]. One type of genetic variation influences the traits individually Another type of genetic variation is that SNPs may show little effect individually, but strong effects jointly This is known as epistasis or multilocus interactions [3]. Identifying epistatic interactions arises as an important problem in multilocus based approaches [4]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call