Abstract

Dimensionality reduction is an essential preprocessing step for data mining. Principal component analysis (PCA) is the most classical method of reducing dimension and a variety of methods based on it are extended. However, all these methods require at least one transposition and quadrature operation of the original high-dimensional matrix and the dimension reduction results loss the meaning of the original data, it will inevitably bring difficulties for people to the further analysis of classification or clustering results. We develop a novel algorithm named DRWPCA in this paper, it does not need to map the original data to the space of other dimensions for processing, but realizes the dimension reduction by analyzing the correlation between the dimensions, and therefore the physical meaning of the original data set is retained. It utilizes mathematical statistics to obtain the correlation coefficient or the degree of correlation between attributes. By statistical analysis of the degree of correlation between attributes, the feature with high correlation is removed so as to achieve the goal of reducing the dimension. DRWPCA is inspired by the content of the correlation coefficient part of the digital feature of a random variable, and the sliding window model for traffic control in network engineering. Experimental result demonstrates that the DRWPCA provides promising accuracy, higher ability to reduce dimension and preserves the original information of the data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.