A Problem of Dimensionality in Normal Mixture Analysis

Marco Bee,Bernard Flury

doi:10.1111/1467-9469.00302

Abstract

Suppose the p‐variate random vector W, partitioned into q variables W1 and p‐q variables W2, follows a multivariate normal mixture distribution. If the investigator is mainly interested in estimation of the parameters of the distribution of W1, there are two possibilities: (1) use only the data on W1 for estimation, and (2) estimate the parameters of the p‐variate mixture distribution, and then extract the estimates of the marginal distribution of W1. In this article we study the choice between these two possibilities mainly for the case of two mixture components with identical covariance matrices. We find the asymptotic distribution of the linear discriminant function coefficients using the work of Efron (1975) and O'Neill (1978), and give a Wald–test for redundancy of W2. A simulation study gives further insights into conditions under which W2 should be used in the analysis: in summary, the inclusion of W2 seems justified if Δ 2.1, the Mahalanobis distance between the two component distributions based on the conditional distribution of W2 given W1, is at least 2.

Full Text