Comparisons among several methods for handling missing data in principal component analysis (PCA)

Sébastien Loisel,Yoshio Takane

doi:10.1007/s11634-018-0310-9

Abstract

Missing data are prevalent in many data analytic situations. Those in which principal component analysis (PCA) is applied are no exceptions. The performance of five methods for handling missing data in PCA is investigated, the missing data passive method, the weighted low rank approximation (WLRA) method, the regularized PCA (RPCA) method, the trimmed scores regression method, and the data augmentation (DA) method. Three complete data sets of varying sizes were selected, in which missing data were created randomly and non-randomly. These data were then analyzed by the five methods, and their parameter recovery capability, as measured by the mean congruence coefficient between loadings obtained from full and missing data, is compared as functions of the number of extracted components (dimensionality) and the proportion of missing data (censor rate). For randomly censored data, all five methods worked well when the dimensionality and censor rate were small. Their performance deteriorated, as the dimensionality and censor rate increased, but the speed of deterioration was distinctly faster with the WLRA method. The RPCA method worked best and the DA method came as a close second in terms of parameter recovery. However, the latter, as implemented here, was found to be extremely time-consuming. For non-randomly censored data, the recovery was also affected by the degree of non-randomness in censoring processes. Again the RPCA method worked best, maintaining good to excellent recoveries when the censor rate was small and the dimensionality of solutions was not too excessive.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparisons among several methods for handling missing data in principal component analysis (PCA)

Abstract

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification

Lead the way for us

Journal: Advances in Data Analysis and Classification	Publication Date: Jan 18, 2018
Citations: 12

Similar Papers

Optimizing Data Augmentation for Semantic Segmentation on Small-Scale Dataset
Rui Ma ... Pin Tao
-
Rui Ma, et. al.Rui Ma ... Pin Tao
15 Jun 2019
15 Jun 2019

Data Augmentation for Building Footprint Segmentation in SAR Images: An Empirical Study
Sandhi Wangiyana ... Artur Gromek
Remote Sensing | VOL. 14
Sandhi Wangiyana, et. al.Sandhi Wangiyana ... Artur Gromek
22 Apr 2022
Remote Sensing | VOL. 14

Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization
Guanlin Li ... Guoping Huang
-
Guanlin Li, et. al.Guanlin Li ... Guoping Huang
01 Jan 2019
01 Jan 2019

A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation
Azal Ahmad Khan ... Rohitash Chandra
Expert Systems with Applications | VOL. 244
Azal Ahmad Khan, et. al.Azal Ahmad Khan ... Rohitash Chandra
10 Dec 2023
Expert Systems with Applications | VOL. 244

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparisons among several methods for handling missing data in principal component analysis (PCA)

Abstract

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification