Principal Component Analysis (PCA) aims to acquire the principal component space containing the essential structure of data, instead of being used for mining and extracting the essential structure of data. In other words, the principal component space contains not only information related to the essential structure of data but also some unrelated information. This frequently occurs when the intrinsic dimensionality of data is unknown or when it has complex distribution characteristics such as multi-modalities, manifolds, etc. Therefore, it is unreasonable to identify noise and useful information based solely on reconstruction error. For this reason, PCA is unsuitable as a preprocessing technique for most applications, especially in noisy environment. To solve this problem, this paper proposes robust PCA based on fuzzy local information reservation (FLIPCA). By analyzing the impact of reconstruction error on sample discriminability, FLIPCA provides a theoretical basis for noise identification and processing. This not only greatly improves its robustness but also extends its applicability and effectiveness as a data preprocessing technique. Meanwhile, FLIPCA maintains consistent mathematical descriptions with traditional PCA while having few adjustable hyperparameters and low algorithmic complexity. Finally, we conducted comprehensive experiments on synthetic and real-world datasets, which substantiated the superiority of our proposed algorithm.
Read full abstract