Abstract

PCA algorithm is a typical data dimensionality reduction method, which projects high-dimensional data to a lower-dimensional space to obtain a low-dimensional data set that can maximally represent these characteristics of the original data set. The PCA algorithm can effectively achieve dimensionality reduction for high-dimensional data and is widely used in various fields. Aimed at the tedious calculation process of PCA algorithm and the time-consuming of processing massive stream data, this paper proposes a distributed parallel dimensionality reduction algorithm that called DP-PCA by improving the PCA algorithm. Based on the theory of PCA algorithm, DP-PCA algorithm includes three parts of improvement research. Firstly, the original data set is preprocessed by using the “mean” method. Secondly, the solution process of correlation coefficient matrix is improved. Thirdly, this paper designs a distributed parallel dimensionality reduction scheme for DP-PCA algorithm. In addition, this paper deploys DP-PCA algorithm on Storm platform to realize parallelization of the algorithm, and tests the DP-PCA algorithm. Experiments show that DP-PCA algorithm improves computational efficiency and reduces the dimensionality reduction time, and improves the speedup ratio.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.