Abstract

Many cloud applications are data intensive requiring the processing of large data sets and the MapReduce/Hadoop architecture has become the de facto processing framework for these applications. Large data sets are stored in data nodes in the cloud which are typically SAN or NAS devices. Cloud applications process these data sets using a large number of application virtual machines (VMs), with the total completion time being an important performance metric. There are many factors that affect the total completion time of the processing task such as the load on the individual servers, the task scheduling mechanism, communication and data access bottlenecks, etc. One dominating factor that affects completion times for data intensive applications is the access latencies from processing nodes to data nodes. Ideally, one would like to keep all data access local to minimize access latency but this is often not possible due to the size of the data sets, capacity constraints in processing nodes which constrain VMs from being placed in their ideal location and so on. When it is not possible to keep all data access local, one would like to optimize the placement of VMs so that the impact of data access latencies on completion times is minimized. We address this problem of optimized VM placement - given the location of the data sets, we need to determine the locations for placing the VMs so as to minimize data access latencies while satisfying system constraints. We present optimal algorithms for determining the VM locations satisfying various constraints and with objectives that capture natural tradeoffs between minimizing latencies and incurring bandwidth costs. We also consider the problem of incorporating inter-VM latency constraints. In this case, the associated location problem is NP-hard with no effective approximation within a factor of 2 - ϵ for any ϵ > 0. We discuss an effective heuristic for this case and evaluate by simulation the impact of the various tradeoffs in the optimization objectives.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call