Estimating an Optimal Correlation Structure from Replicated Molecular Profiling Data Using Finite Mixture Models

Lipi R Acharya,Dongxiao Zhu

doi:10.1109/icmla.2009.53

Abstract

Estimating the correlation structure of a gene set is an ubiquitous problem in many pattern analyses of replicated molecular profiling data. However, the commonly used Maximum Likelihood Estimates (MLE) approaches, do not automatically accommodate replicated measurements. Often, an ad hoc step of preprocessing e. g. averaging, either weighted, un-weighted or something in between is needed, which might wipe out important patterns of low magnitude and/or cancel out patterns of similar magnitude. We treat each replicate individually as a random variable and design a finite mixture model to estimate an optimal correlation structure from replicated molecular profiling data. Assuming that the measurements are independently, identically distributed (i. i. d.) samples from a mixture of two multivariate normal distributions, one with a constrained set of parameters and the other with an unconstrained parameter structure, we employ an Expectation-Maximization (EM) algorithm to estimate component parameters. We carry out a comparative study, including both simulations and real-world data analysis, to assess the estimation of correlation structure using the proposed model and the constrained model given by the first component of the mixture. The two models were further tested for their performances in clustering real-world data. The mixture model approach is shown to have an overall better performance.

Full Text