Abstract

Effectively analyzing geo-distributed datasets is emerging as a major demand in a cloud-edge system. Previous researches mainly focus on offloading proper data analytical tasks from hot or weak edges to the datacenter (DC) to minimize the response time for current jobs. Since several datasets would be accessed multiple times, we argue that re-distributing data along with task offloading would benefit the forthcoming jobs as well as improve the overall performance, although it may increase the completion time of current job. In order to minimize the overall completion time for a sequence of jobs as well as guarantee the current job response time and WAN usage, we formulate e-bounded geo-distributed data-driven task scheduling problem under the consideration of heterogeneity. Afterwards, we design an online data-driven task scheduling schema runData, which offloads proper tasks and related data via piggybacking to a DC based on delicate calculated probabilities. Through our rigorous theoretical analysis, runData can be proved concentrated on its optimum with high probability. We implement runData based on Spark. Both testbed and simulation results show that runData re-distributes proper data via piggybacking and achieves up to 37% improvement on average response time compared with state-of-the-art task scheduling schemas.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.