Abstract

Data Stream Processing (DSP) applications, which generate real-time analytics on continuous data flows, have become prevalent recently. For the deployment of DSP applications, task placement is an important and essential part. As determining the optimal task placement is an NP-hard problem, several efficient heuristics have been designed and Deep Reinforcement Learning (DRL) was used to train the scheduling agent. Current DRL-based approach assumes all resources including CPU, memory and networking are homogeneous. However, the available computation and network resources are heterogeneous in many scenarios. To deal with it, we devise a general DRL-based resource-aware framework, which models resources using graph embedding and attention mechanism to predict the placement. Furthermore, in order to accelerate the training process and improve the throughput, we propose an efficient throughput estimation tool, which can estimate the throughput with high accuracy. We integrated our scheduling heuristic framework into Apache Flink and conducted comprehensive testings using multiple synthetic and real DSP applications. The experimental results show that our framework increases the throughput by 64%, 42%, 29% on average respectively compared with three state-of-the-art strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.