Abstract

At present, few studies have considered the clustering analysis of high-dimensional data collected by many sensors in a real-time streaming environment. Existing clustering analysis algorithms for high-dimensional data are primarily based on batch processing models, and most of them cannot meet the requirements of incremental high-dimensional data streams, which are extremely common in practical applications. To address the aforementioned problems, this paper focuses on the study of high-dimensional data clustering based on stream processing, and proposes a high-dimensional data stream clustering algorithm based on a feedback control system, which comprises three stages: window principal component analysis, feedback stream clustering, and feedback controller. The classic exponentially weighted attenuation function is used in the window principal component analysis to avoid concept drift in the data stream, and incremental feature extraction executes through the sliding window to improve the iterative efficiency of the data in the window. To minimize the errors caused by variability in projection angles during dimensionality reduction, a feedback stream clustering stage is designed with alternating iterations of window clustering and cluster aggregation. Aiming at the problems caused by manually adjusting the hyperparameters used in high-dimensional data stream clustering, a feedback controller is developed to adjust the hyperparameters in the two other stages by analyzing the clustering results in real time and using a discriminant score to adopt corresponding feedback strategies. The experimental comparisons between the proposed and the traditional algorithms on multiple datasets demonstrate the effectiveness of the former.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.