Abstract

A key step of any statistical multivariate analysis concerns the choice of variables in line with the main objectives of the study. Usually, the available procedures to face this problem are restricted to a-posteriori statistical analysis, using Bayesian approaches or stepwise selection procedures. The main objective of the present paper is to revisit a framework where the a-priori choice of variables makes sense under specific conditions and to propose a factor analysis model particularly adapted to structured quantitative big data. We have associated our complete sample of variables to a mixture of two bipolar Watson distributions defined on the n-sphere, W ( μ i , ξ i ) , i = 1 , 2 , where μ i is a direction parameter and ξ i is a concentration parameter. The likelihood estimates of the direction parameter μ i is just the first principal component associated of a PCA of cluster i. The identification of the mixture of Watson distribution was obtained by cluster analysis, namely a previous hierarchical cluster analysis followed by a k-means partition of the global sample of variables. These multivariate data were explained by an alternative factor analysis model potentially delivering directly interpretable solutions without the need of rotations procedures. The loadings of this factorial model were obtained by regression. The final results concerning communalities of the 16 variables showed that for a great part of them unit variance was quite well explained by the factorial model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call