Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes

Afrah Shafquat,Jason G Mezey,Ronald G Crystal

doi:10.1186/s12859-020-3387-z

Afrah Shafquat, Jason G Mezey + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-3387-z

Copy DOI

Journal: BMC Bioinformatics	Publication Date: May 7, 2020
Citations: 7	License type: open-access

Affiliation: Cornell University

Abstract

BackgroundHeterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci. While well appreciated, almost all analyses of GWAS data consider reported disease phenotype values as is without accounting for potential misclassification.ResultsHere, we introduce Phenotype Latent variable Extraction of disease misdiagnosis (PheLEx), a GWAS analysis framework that learns and corrects misclassified phenotypes using structured genotype associations within a dataset. PheLEx consists of a hierarchical Bayesian latent variable model, where inference of differential misclassification is accomplished using filtered genotypes while implementing a full mixed model to account for population structure and genetic relatedness in study populations. Through simulations, we show that the PheLEx framework dramatically improves recovery of the correct disease state when considering realistic allele effect sizes compared to existing methodologies designed for Bayesian recovery of disease phenotypes. We also demonstrate the potential of PheLEx for extracting new potential loci from existing GWAS data by analyzing bipolar disorder and epilepsy phenotypes available from the UK Biobank. From the PheLEx analysis of these data, we identified new candidate disease loci not previously reported for these datasets that have value for supplemental hypothesis generation.ConclusionPheLEx shows promise in reanalyzing GWAS datasets to provide supplemental candidate loci that are ignored by traditional GWAS analysis methodologies.

Highlights

Heterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci
Phenotype Latent variable Extraction of disease misdiagnosis (PheLEx) compared to existing methods At present, there is only one existing misclassification framework designed for the analysis of GWAS data [49, 73, 77], referred to here as the “Rekaya” method or framework
As accuracy of misclassification probability under the misclassification model depends on estimated function of single nucleotide polymorphism (SNP) effects and typically most SNPs in a linkage disequilibrium (LD-) pruned GWAS dataset are not associated with the phenotype of interest, PheLEx filters out potentially uninformative SNPs by taking a subset of statistically significant GWAS genotypes as input, which provides significant advantages in terms of computational expense and accuracy in identifying misclassified samples

Summary

Introduction

Heterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci. Shafquat et al BMC Bioinformatics (2020) 21:178 interactions [25,26,27,28], as well as methods aimed to extract impact of loci with rare variants [29,30,31,32,33] Together, these innovations in GWAS design and methodology have led to discovery of candidate loci where impact is noticeable in diseases such as type 2 diabetes and schizophrenia where large-scale consortium studies have enabled isolation of numerous causal loci with low frequency and small effects [2, 34,35,36,37]. Methods that could reliably identify cases of misclassification in GWAS could be a promising approach for improving candidate loci discovery in GWAS, when considering the potential for immediate impact and implementation at minimal cost

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Diabetes and Alzheimer's disease: shared genetic susceptibility?
John Hardy ... Valentina Escott-Price
The Lancet Neurology | VOL. 21
John Hardy, et. al.John Hardy ... Valentina Escott-Price
18 Oct 2022
The Lancet Neurology | VOL. 21

Alcohol Dependence Genetics: Lessons Learned From Genome-Wide Association Studies (GWAS) and Post-GWAS Analyses.
Amy B Hart ... Henry R Kranzler
Alcoholism: Clinical and Experimental Research | VOL. 39
Amy B Hart, et. al.Amy B Hart ... Henry R Kranzler
25 Jun 2015
Alcoholism: Clinical and Experimental Research | VOL. 39

Methods for genetic epidemiology
Aniket Mishra
-
Aniket MishraAniket Mishra
06 Nov 2015
06 Nov 2015

Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies.
Stefanie Friedrichs ... Juliane Manitz
Computational and mathematical methods in medicine | VOL. 2017
Stefanie Friedrichs, et. al.Stefanie Friedrichs ... Juliane Manitz
01 Jan 2017
Computational and mathematical methods in medicine | VOL. 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics