Latent variables analysis is an important part of psychometric research. In this context, factor analysis and other related techniques have been widely applied for the investigation of the internal structure of psychometric tests. However, these methods perform a linear dimensionality reduction under a series of assumptions that could not always be verified in psychological data. Predictive techniques, such as artificial neural networks, could complement and improve the exploration of latent space, overcoming the limits of traditional methods. In this study, we explore the latent space generated by a particular artificial neural network: the variational autoencoder. This autoencoder could perform a nonlinear dimensionality reduction and encourage the latent features to follow a predefined distribution (usually a normal distribution) by learning the most important relationships hidden in data. In this study, we investigate the capacity of autoencoders to model item-factor relationships in simulated data, which encompasses linear and nonlinear associations. We also extend our investigation to a real dataset. Results on simulated data show that the variational autoencoder performs similarly to factor analysis when the relationships among observed and latent variables are linear, and it is able to reproduce the factor scores. Moreover, results on nonlinear data show that, differently than factor analysis, it can also learn to reproduce nonlinear relationships among observed variables and factors. The factor score estimates are also more accurate with respect to factor analysis. The real case results confirm the potential of the autoencoder in reducing dimensionality with mild assumptions on input data and in recognizing the function that links observed and latent variables.
Read full abstract