Abstract

Despite the accumulation of quantitative trait loci (QTL) data in many complex human diseases, most of current approaches that have attempted to relate genotype to phenotype have achieved limited success, and genetic factors of many common diseases are yet remained to be elucidated. One of the reasons that makes this problem complex is the existence of single nucleotide polymorphism (SNP) interaction, or epistasis. Due to excessive amount of computation for searching the combinatorial space, existing approaches cannot fully incorporate high-order SNP interactions into their models, but limit themselves to detecting only lower-order SNP interactions. We present an empirical approach based on ridge regression with polynomial kernels and model selection technique for determining the true degree of epistasis among SNPs. Computer experiments in simulated data show the ability of the proposed method to correctly predict the number of interacting SNPs provided that the number of samples is large enough relative to the number of SNPs. For cases in which the number of the available samples is limited, we propose to perform sliding window approach to ensure sufficiently large sample/SNP ratio in each window. In computational experiments using heterogeneous stock mice data, our approach has successfully detected subregions that harbor known causal SNPs. Our analysis further suggests the existence of additional candidate causal SNPs interacting to each other in the neighborhood of the known causal gene. Software is available from https://github.com/HirotoSaigo/KDSNP .

Highlights

  • With the recent advances in high-throughput genotyping technologies, hundreds of thousands of single nucleotide polymorphisms (SNP) are assayed per person toK

  • Despite the accumulation of genome wise association study (GWAS) data of this kind, current approaches have not been always successful in explaining the relationships between genotypes and phenotypes for many common complex multifactorial human traits.[1]

  • Therst one is \rare variants-common diseases" hypothesis, which attributes the reason to the lack of sequencing sensitivity and doubts the existence of very rare causal SNPs those are not sequenced at the current sequencing resolution.[2]

Read more

Summary

Introduction

With the recent advances in high-throughput genotyping technologies, hundreds of thousands of single nucleotide polymorphisms (SNP) are assayed per person toK. Despite the accumulation of genome wise association study (GWAS) data of this kind, current approaches have not been always successful in explaining the relationships between genotypes and phenotypes for many common complex multifactorial human traits.[1] There are two major explanations believed to be the reason for this limited success. Therst one is \rare variants-common diseases" hypothesis, which attributes the reason to the lack of sequencing sensitivity and doubts the existence of very rare causal SNPs those are not sequenced at the current sequencing resolution.[2] The second one is the limitation of the current statistical models that try to relate genotypes to phenotype without accounting for the e®ect of combination of genes, or epistasis.[3] These two reasons are believed to be a key tolling the \missing heritability" between genotypes and phenotypes.[2]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call