Abstract

SummaryPrincipal Component Analysis (PCA) is a popular multivariate analytic tool which can be used for dimension reduction without losing much information. Data vectors containing a large number of features arriving sequentially may be correlated with each other. An effective algorithm for such situations is online PCA. Existing Online PCA research works revolve around proposing efficient scalable updating algorithms focusing on compression loss only. They do not take into account the size of the dataset at which further arrival of data vectors can be terminated and dimension reduction can be applied. It is well known that the dataset size contributes to reducing the compression loss – the smaller the dataset size, the larger the compression loss while larger the dataset size, the lesser the compression loss. However, the reduction in compression loss by increasing dataset size will increase the total data collection cost. In this paper, we move beyond the scalability and updation problems related to Online PCA and focus on optimising a cost‐compression loss which considers the compression loss and data collection cost. We minimise the corresponding risk using a two‐stage PCA algorithm. The resulting two‐stage algorithm is a fast and an efficient alternative to Online PCA and is shown to exhibit attractive convergence properties with no assumption on specific data distributions. Experimental studies demonstrate similar results and further illustrations are provided using real data. As an extension, a multi‐stage PCA algorithm is discussed as well. Given the time complexity, the two‐stage PCA algorithm is emphasised over the multi‐stage PCA algorithm for online data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.