Abstract

Correspondence analysis (CA) and principal component analysis (PCA) are often used to describe multivariate data. In certain applications they have been used for estimation in latent variable models. The theoretical basis for such inference is assessed in generalized linear models where the linear predictor equals αj + xiβj or aj — bj (xi — xiβj 2, (i - 1,…,n; j = 1,…, m) and xi is treated as a latent fixed effect. The PCA and CA eigenvectors/column scores are evaluated as estimators of βj and uj and as estimators of uj. With m fixed and n ↑ ∞, consistent estimators cannot be obtained due to the incidental parameters problem unless sufficient “moment” conditions are imposed on xi. PCA is equivalent to maximum likelihood estimation for the linear Gaussian model and gives a consistent estimator of βj (up to a scale change) when the second sample moment of xi is positive and finite in the limit. It is inconsistent for Poisson and Bernoulli distributions, but when bj is constant, its first and/or second eigenvectors can consistently estimate uj (up to a location and scale change) for the quadratic Gaussian model. In contrast, the CA estimator is always inconsistent. For finite samples, however, the CA column scores often have high correlations with the uj 's, especially when the response curves are spread out relative to one another. The correlations obtained from PCA are usually weaker, although the second PCA eigenvector can sometimes do much better than the first eigenvector, and for incidence data with tightly clustered response curves its performance is comparable to that of CA. For small sample sizes, PCA and particularly CA are competitive alternatives to maximum likelihood and may be preferred because of their computational ease.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call