Abstract
Clustering genetic variants based on their associations with different traits can provide insight into their underlying biological mechanisms. Existing clustering approaches typically group variants based on the similarity of their association estimates for various traits. We present a new procedure for clustering variants based on their proportional associations with different traits, which is more reflective of the underlying mechanisms to which they relate. The method is based on a mixture model approach for directional clustering and includes a noise cluster that provides robustness to outliers. The procedure performs well across a range of simulation scenarios. In an applied setting, clustering genetic variants associated with body mass index generates groups reflective of distinct biological pathways. Mendelian randomization analyses support that the clusters vary in their effect on coronary heart disease, including one cluster that represents elevated body mass index with a favourable metabolic profile and reduced coronary heart disease risk. Analysis of the biological pathways underlying this cluster identifies inflammation as potentially explaining differences in the effects of increased body mass index on coronary heart disease.
Highlights
In recent years, the number of genome-wide association studies (GWAS) has grown enormously [1]
Genome-wide association studies have found many genetic variants that are correlated with traits, complex traits such as body mass index (BMI)
Genetic association data cannot tell us how these variants influence the trait, or whether they influence the trait in the same way
Summary
The number of genome-wide association studies (GWAS) has grown enormously [1] Such studies provide valuable information linking genetic variants across the human genome to a wide range of traits. What often remain less understood are the underlying mechanisms by which the associated genetic variants affect the traits. A number of techniques have been implemented to cluster genetic variants based on their associations with traits that are believed to be relevant in informing biological pathways. Other clustering approaches which have been applied to genetic variant-trait association estimates include fuzzy c-means [6] and Bayesian nonnegative matrix factorization [3]. A related approach which aims to determine distinct components of genetic variant-trait associations uses truncated singular value decomposition [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.