Abstract
Copulas provide a modular parameterization of multivariate distributions that decouples the modeling of marginals from the dependencies between them. The Gaussian Mixture Copula Model (GMCM) is a highly flexible copula that can model many kinds of multi-modal dependencies, as well as asymmetric and tail dependencies. They have been effectively used in clustering non-Gaussian data and in Reproducibility Analysis, a meta-analysis method designed to verify the reliability and consistency of multiple high-throughput genomic experiments. Parameter estimation for GMCM is challenging due to its intractable likelihood. The best previous methods maximize a proxy-likelihood through a Pseudo Expectation Maximization (PEM) algorithm. No guarantees of convergence or convergence to the correct parameters are provided by those methods. Using Automatic Differentiation (AD), a method, called AD-GMCM, is developed that can maximize the exact GMCM likelihood. Simulation studies and experiments on real data show that AD-GMCM finds more accurate parameter estimates than PEM and yields better performance in clustering and reproducibility analysis. The advantages of an AD-based approach to address problems related to monotonic increase of likelihood and parameter identifiability in GMCM are discussed. The two well-known cases of degeneracy of maximum likelihood in GMM that can lead to spurious clustering solutions are analyzed for GMCM as well. The analysis reveals that, unlike GMM, GMCM is not affected in one of the cases.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have