Abstract
In the big data era, the speed of analytical processing is influenced by the storage and retrieval capabilities to handle large amounts of data. While the distributed crunching applications themselves can yield useful information, the analysts face difficult challenges: they need to predict how much data to process and where, such that to get an optimum data crunching cost, while also respect deadlines and service level agreements within a limited budget. In today's data centers, data processing on demand and data transfers requests coming from distributed applications are usually expressed as aperiodic tasks. In this paper, we challenge the problem of tasks scheduling with deadline constraints of aperiodic tasks within inter-Cloud environments. In massively multithreaded computing systems that deal with data-intensive applications, Hadoop and BaTs tasks arrive periodically, which challenges traditional scheduling approaches previously proposed for supercomputing. Here, we consider the deadline as the main constraint, and propose a method to estimate the number of resources needed to schedule a set of aperiodic tasks, considering both execution and data transfers costs. Starting from classical scheduling techniques, and considering asynchronous tasks handling, we analyze the possibility of decoupling task arriving from task creation, scheduling and execution, sets of actions that can be put into a peer-to-peer relation over a network or over a client---server architecture in the Cloud. Based on a mathematical model, and using different simulation scenarios, we prove the following statements: (1) multiple source of independent aperiodic tasks can be considered similar to a single one; (2) with respect to the global deadline, the tasks migration between different regional centers is the appropriate solution when the number of estimated resources exceed a data center capacity; and (3) in a heterogeneous data center, we need a higher number of resources for the same request in order to respect the deadline constraints. We believe such results will benefit researchers and practitioners alike, who are interested in optimizing the resource management in data centers according to novel challenges coming from next-generation big data applications.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have