Abstract
Principal Component Analysis (PCA) is one of the most widely used dimensionality reduction methods in data analysis, which is renowned for its ability to handle the underlying structure of datasets. However, in the era of big data, characterized by high-dimensional, large-scale, noisy, and dynamic datasets, traditional PCA faces significant limitations. This paper reviews the challenges faced by PCA in big data environments and explores key extensions developed to enhance its applicability. Beginning with an overview of PCAs mathematical principles, the paper identifies its inefficiency in dealing with massive datasets, data noise, and difficulties when applied to real-time environments. To solve these problems, various extensions of PCA have been created, including Incremental PCA, Sparse PCA, Kernel PCA, and Robust PCA. This survey further discusses practical applications of PCA in big data domains, including biological analysis, financial analysis, and image processing. Besides, the survey also examines the future directions of PCA research, such as combining PCA with advanced machine learning models, utilizing quantum computing to enhance efficiency, and ensuring privacy in PCA applications. This review aims to deepen the understanding of PCA in big data analysis, address the challenges, and reveal innovative solutions to enhance its efficiency and capability in handling high-dimensional and complex datasets for big data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have