ABSTRACTMeasurement errors are ubiquitous in all experimental sciences. Depending on the particular experimental platform used to acquire data, different types of errors are introduced, amounting to an admixture of additive and multiplicative error components that can be uncorrelated or correlated. In this paper, we investigate the effect of different types of experimental error on the recovery of the subspace with principal component analysis (PCA) using numerical simulations. Specifically, we assessed how different error characteristics (variance, correlation, and correlation structure), loading structures, and data distributions influence the accuracy to estimate an error‐free (true) subspace from sampled data with PCA. Quality was assessed in terms of the mean squared reconstruction error and the congruence to the error‐free loadings, using the pseudorank and adjusting for rotational ambiguity. Analysis of variance reveals that the error variance, error correlation structure, and their interaction with the loading structure are the factors mostly affecting quality of loading estimation from sampled data. We advocate for the need to characterize and assess the nature of measurement error and the need to adapt formulations of PCA that can explicitly take into account error structures in the model fitting.
Read full abstract