Abstract

In this paper, we propose to use the Simple Principal Component Analysis (SPCA) for dimensionality reduction of the vector space information retrieval model. The SPCA algorithm is a data-oriented fast method which does not require the computation of the variance-covariance matrix. In SPCA, principal components are estimated iteratively so we also propose a criteria to determine the convergence. The optimum number of iterations for each principal component can be determined using the criteria. Experimentally, we show that the SPCA-based method offers improvement over the conventional SVD-based method despite its small amount of computation. This advantage of SPCA can be attributed to its iterative procedure which is similar to clustering methods such as k-means clustering. On the other hand, the proposed method which orthogonalizes the basis vectors also achieved much higher accuracy than the conventional random projection method based on k-means clustering.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.