Bivariate Gaussian Mixture Model Research Articles

Background Identifying shared genetics is important as it uncovers hidden relationship between complex traits and improves our understanding of disease etiology. Today genetic correlation is commonly used as the principal measure that quantifies genetic overlap. Available methods can calculate genetic correlation from raw genotypes (restricted maximum likelihood, polygenic risk scores), from a set of Single-Nucleotide Polymorphisms (SNPs) that pass genome-wide significance threshold (Mendelian Randomization), or utilizing all data from Genome Wide Association Studies (GWASes) including SNPs that do not reach genome-wide significance (Cross-Trait Linkage Disequilibrium (LD) Score Regression). These methods for evaluating genetic overlap are unable to capture a mixture of effect directions across shared genetic variants (i.e., only reporting overall positive, negative or no genetic correlations), while recent analyses suggest that correlation across polygenic traits in the effect size and directionality of a SNP is not the same across all SNPs. Methods Bivariate Gaussian Mixture Model provides a novel way to detect and quantify shared genetics independently from genetic correlation. In the absence of genetic correlation we use another measure of polygenic overlap, expressed as the proportion of SNPs associated with both traits. To estimate this quantity, we model true per-SNP effect sizes as a mixture of four bivariate normal distributions: two causal components specific to each trait, one causal component of SNPs affecting both traits, and a null component of SNPs with no effect on either trait. We interpret the weight of each component in the mixture as the proportion of SNPs in that component. The parameters of the mixture model are estimated from the summary statistics data by direct optimization of the maximum likelihood. Our statistical model relates observed signed test statistics (GWAS z-scores) to the underlying per-SNP effect sizes, incorporating effects of LD structure, minor allele frequency, sample size, cryptic relationships / sample stratification, and sample overlap, to capture all these effects on GWAS z-scores. Results We show in simulations that our model differentiates cases with no polygenic overlap versus cases with significant overlap in the absence of genetic correlation. We show genetic overlap between schizophrenia, waist-hip-ratio and triglycerides, despite the fact that genetic correlation is close to zero. We apply our model to schizophrenia, bipolar disorder and IQ data, and show that in the presence of genetic correlation our model reports consistent results with those from cross-trait LD score regression. Discussion It appears to be a common practice to interpret the lack of genetic correlation as no relation between traits. However, lack of correlation does not necessarily imply independence - a fact well known from statistics. Thus, we advocate that certain important cases of genetic overlap are not captured by genetic correlation. Our model quantifies genetic overlap between traits both with and without genetic correlation. It controls for the fact that if both traits are highly polygenic then some proportion of SNPs is associated with both traits by chance or due to LD structure. In addition to the insights into genetic architecture our model can improve SNP discovery, by estimating posterior effect size of one trait given observed GWAS z-score in another trait. Further, more accurately estimated effect sizes can be used for improving polygenic risk scores.

Read full abstract

Summary In individuals who are infected with human immunodeficiency virus (HIV), distributions of quantitative HIV ribonucleic acid measurements may be highly left censored with an extra spike below the limit of detection LD of the assay. A two-component mixture model with the lower component entirely supported on [0, LD] is recommended to model the extra spike in univariate analysis better. Let LD1 and LD2 be the limits of detection for the two HIV viral load measurements. When estimating the correlation coefficient between two different measures of viral load obtained from each of a sample of patients, a bivariate Gaussian mixture model is recommended to model the extra spike on [0, LD1] and [0, LD2] better when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution. When the proportion of both variables falling below LD is very large, the parameters of the lower component may not be estimable since almost all observations from the lower component are falling below LD. A partial solution is to assume that the lower component’s entire support is on [0, LD1]×[0, LD2]. Maximum likelihood is used to estimate the parameters of the lower and higher components. To evaluate whether there is a lower component, we apply a Monte Carlo approach to assess the p-value of the likelihood ratio test and two information criteria: a bootstrap-based information criterion and a cross-validation-based information criterion. We provide simulation results to evaluate the performance and compare it with two ad hoc estimators and a single-component bivariate Gaussian likelihood estimator. These methods are applied to the data from a cohort study of HIV-infected men in Rio de Janeiro, Brazil, and the data from the Women’s Interagency HIV oral study. These results emphasize the need for caution when estimating correlation coefficients from data with a large proportion of non-detectable values when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution.

Read full abstract

Bivariate Gaussian Mixture Model Research Articles

Articles published on Bivariate Gaussian Mixture Model

On Expectation-Maximization Algorithm with Split and Merge in R

Image Segmentation Based on G.O.A for Finding Deformities in Medical and Aura Images

Bayesian learning of Gaussian mixture model for calculating debris flow exceedance probability

A Novel Methodology for Disease Identification Using Metaheuristic Algorithm and Aura Image

Modeling the duration and size of wildfires using joint mixture models

Gaussian Mixture Models Based on Principal Components and Applications

SU22 - BIVARIATE GAUSSIAN MIXTURE MODEL FOR GWAS SUMMARY STATISTICS

A Hybrid Approach for Identification of Manhole and Staircase to Assist Visually Challenged

3DSpectra: A 3-dimensional quantification algorithm for LC-MS labeled profile data.

20th century intraseasonal Asian monsoon dynamics viewed from Isomap

Isomap nonlinear dimensionality reduction and bimodality of Asian monsoon convection

Image Segmentation and Retrievals based on Finite Doubly Truncated Bivariate Gaussian Mixture Model and KMeans

Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences

Correlating Two Continuous Variables Subject to Detection Limits in the Context of Mixture Distributions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bivariate Gaussian Mixture Model Research Articles

Articles published on Bivariate Gaussian Mixture Model

On Expectation-Maximization Algorithm with Split and Merge in R

Image Segmentation Based on G.O.A for Finding Deformities in Medical and Aura Images

Bayesian learning of Gaussian mixture model for calculating debris flow exceedance probability

A Novel Methodology for Disease Identification Using Metaheuristic Algorithm and Aura Image

Modeling the duration and size of wildfires using joint mixture models

Gaussian Mixture Models Based on Principal Components and Applications

SU22 - BIVARIATE GAUSSIAN MIXTURE MODEL FOR GWAS SUMMARY STATISTICS

A Hybrid Approach for Identification of Manhole and Staircase to Assist Visually Challenged

3DSpectra: A 3-dimensional quantification algorithm for LC-MS labeled profile data.

20th century intraseasonal Asian monsoon dynamics viewed from Isomap

Isomap nonlinear dimensionality reduction and bimodality of Asian monsoon convection

Image Segmentation and Retrievals based on Finite Doubly Truncated Bivariate Gaussian Mixture Model and KMeans

Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences

Correlating Two Continuous Variables Subject to Detection Limits in the Context of Mixture Distributions