Abstract

The Internet of Things (IoT)-enabled applications use sensors and actuators to collect big data, which are processed by big data models, e.g., Spark. Generally, data processing tasks are precedence constrained and the computation results are transmitted to other IoT devices. In this article, we consider the Spark workflow problem of scheduling tasks with data affinity to heterogeneous servers to minimize the maximum completion time. In a Spark instance, jobs are precedence constrained and stages for each job are also precedence constrained. There are a large number of topological stage orders. A balance between task execution times, determined by heterogeneous servers, and transmission times caused by data affinity is difficult to achieve. A scheduling optimization algorithm framework is proposed, which consists of five components: 1) temporal parameter calculation; 2) ready stage adding; 3) task sequencing; 4) resource allocation; and 5) schedule improvement. Strategies for each component are developed. The algorithmic components are statistically calibrated over a comprehensive set of instances. The proposed algorithm is compared to two modified classic algorithms for similar problems on typical scientific workflow instances. The experimental results demonstrate the effectiveness of the proposal for the considered problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.