Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies

Yan Xu,Jessica Su,Li Xing,Weiliang Qiu,Xuekui Zhang

doi:10.1038/s41598-019-50229-6

Abstract

Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). The traditional SNP-wise approach along with multiple testing adjustment is over-conservative and lack of power in many GWASs. In this article, we proposed a model-based clustering method that transforms the challenging high-dimension-small-sample-size problem to low-dimension-large-sample-size problem and borrows information across SNPs by grouping SNPs into three clusters. We pre-specify the patterns of clusters by minor allele frequencies of SNPs between cases and controls, and enforce the patterns with prior distributions. In the simulation studies our proposed novel model outperforms traditional SNP-wise approach by showing better controls of false discovery rate (FDR) and higher sensitivity. We re-analyzed two real studies to identifying SNPs associated with severe bortezomib-induced peripheral neuropathy (BiPN) in patients with multiple myeloma (MM). The original analysis in the literature failed to identify SNPs after FDR adjustment. Our proposed method not only detected the reported SNPs after FDR adjustment but also discovered a novel BiPN-associated SNP rs4351714 that has been reported to be related to MM in another study.

Highlights

Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs)
Gamma-Gamma model (GG)15, Log-Normal-Normal (LNN)16, extended GG17, extended LNN17, eLNN for paired data18, and Marginal Mixture Distributions (GeneSelectMMD)19 have been proposed for gene microarray data, and edgeR20, DESeq21,22, and DESeq223 have been proposed for next-generation sequencing (RNAseq) data
We conducted simulation studies to compare the performance of our model-based clustering method with the SNP-wise approach

Summary

Introduction

Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). Penalized regression approach has been proposed in GWASs. For instance, linear mixed models (e.g., Kang et al.; Lippert et al.; Zhou and Stephens 20128) treat the effect of the SNP marker of interest as fixed, with the effects of all other SNP markers as normally distributed random effects. Gamma-Gamma model (GG), Log-Normal-Normal (LNN), extended GG (eGG), extended LNN (eLNN), eLNN for paired data, and Marginal Mixture Distributions (GeneSelectMMD) have been proposed for gene microarray data, and edgeR20, DESeq, and DESeq223 have been proposed for next-generation sequencing (RNAseq) data. All these methods have been successfully applied to either gene microarray data analysis (continuous-scale data) or RNAseq data analysis (count data). To the best of our knowledge, no methods have been proposed to borrow information across SNPs (categorical variables with three levels of genotype) to analyze case-control GWAS data that have binary phenotype (cases vs. controls)

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Sep 23, 2019
Citations: 9	License type: open-access

R Discovery Prime

R Discovery Prime

Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Development of Bortezomib Induced Peripheral Neuropathy (BiPN) In Multiple Myeloma: Incidence and Molecular Characterization In Newly Diagnosed Patients Treated with Bortezomib
Annemiek Broyl ... Pieter Sonneveld
Blood | VOL. 116
Annemiek Broyl, et. al.Annemiek Broyl ... Pieter Sonneveld
19 Nov 2010
Blood | VOL. 116

Impact of polymorphisms in apoptosis-related genes on the outcome of childhood acute lymphoblastic leukaemia.
Maria Cabezas ... Susana Rives
British journal of haematology | VOL. 185
Maria Cabezas, et. al.Maria Cabezas ... Susana Rives
29 May 2018
British journal of haematology | VOL. 185

Editor's evaluation: Phenome-wide Mendelian randomisation analysis identifies causal factors for age-related macular degeneration
Lois EH Smith
-
Lois EH SmithLois EH Smith
18 Dec 2022
18 Dec 2022

Abstract ML-1: Pharmacogenomics in the Quest for Precision Endocrine Therapy of Breast Cancer
James N Ingle
American Journal of Cancer | VOL. 75
James N IngleJames N Ingle
30 Apr 2015
American Journal of Cancer | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports