Abstract
In recent years, the volume and velocity of streaming data have been increasing rapidly. Thus, real-time processing scenarios for streaming data have continued to increase. Stream processing tasks face huge challenges in areas such as load optimization, task scheduling, and resource management. Throughput prediction for stream processing tasks is a key technology in these areas. To predict the throughput of stream processing tasks accurately and efficiently, we propose a novel model named the LPG-model. It includes three main components: a light gradient boosting machine (LightGBM), incremental principal component analysis (IPCA), and an evolving deep gated recurrent unit (GRU) network. Unlike existing state-of-the-art models, the LPG-model not only offers a network structure adaptation mechanism (hidden layer adaptation mechanism), but also provides feature processing mechanisms for streaming data. Data preprocessing provides an interpolation method for missing values through an incremental interpolation mechanism and two normalization methods for features through incremental normalization mechanisms. An efficient dimensionality reduction mechanism provided by the LightGBM and IPCA is used to improve the prediction efficiency of the LPG-model. The hidden layer growing mechanism of the evolving deep GRU network is capable of learning new knowledge and maintaining previous knowledge from data streams. Moreover, it also has the ability to capture the temporal aspects of the data streams. The experimental results from four open-source benchmarks illustrate that the LPG-model is more accurate and efficient than state-of-the-art algorithms or networks, under the prequential test-then-train protocol. This proves the effectiveness of the LPG-model in throughput prediction scenarios for stream processing tasks. Furthermore, the numerical results from standard benchmark problems of data streams indicate that the LPG-model has potential to reduce the execution time of high-dimensional data streams with a high classification accuracy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.