Abstract

Using single-nucleotide polymorphism (SNP) genotypes from the 1000 Genomes Project pilot3 data provided for Genetic Analysis Workshop 17 (GAW17), we applied Bayesian network structure learning (BNSL) to identify potential causal SNPs associated with the Affected phenotype. We focus on the setting in which target genes that harbor causal variants have already been chosen for resequencing; the goal was to detect true causal SNPs from among the measured variants in these genes. Examining all available SNPs in the known causal genes, BNSL produced a Bayesian network from which subsets of SNPs connected to the Affected outcome were identified and measured for statistical significance using the hypergeometric distribution. The exploratory phase of analysis for pooled replicates sometimes identified a set of involved SNPs that contained more true causal SNPs than expected by chance in the Asian population. Analyses of single replicates gave inconsistent results. No nominally significant results were found in analyses of African or European populations. Overall, the method was not able to identify sets of involved SNPs that included a higher proportion of true causal SNPs than expected by chance alone. We conclude that this method, as currently applied, is not effective for identifying causal SNPs that follow the simulation model for the GAW17 data set, which includes many rare causal SNPs.

Highlights

  • With ongoing advances in technology, it is possible to follow up regions of genetic linkage or association with high-throughput next-generation sequencing, which can identify novel variants, both common and rare, to be tested for association with the disease under study

  • Our analyses focused on pooled replicates; analyses of single replicates did not perform well, possibly because of the small sample size and corresponding low power

  • We examined three methods for identifying a subset of single-nucleotide polymorphism (SNP) closely related to the disease outcome (Affected variable): the descendants of Affected (DA), the Markov blanket of Affected (MBA), and the children of Affected (CA)

Read more

Summary

Introduction

With ongoing advances in technology, it is possible to follow up regions of genetic linkage or association with high-throughput next-generation sequencing, which can identify novel variants, both common and rare, to be tested for association with the disease under study. BNSL has been a successful data analysis tool in many other areas of biology, such as cell signaling pathways, systems biology, genetic data analysis, and prediction-based classification of disease [1,2,3,4,5]. These models create networks that extract pronounced features from the data and attempt to minimize bias from overfitting or sampling error [5].

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call