Optimal Scheduling of Data-Intensive Applications in Cloud-Based Video Distribution Services

Xili Dai,Xiaomin Wang,Nianbo Liu

doi:10.1109/tcsvt.2016.2565918

Abstract

Cloud computing opens a new door for designing the next-generation video distribution platform. As video services move to the cloud, some related data-intensive applications, such as recommender system, automatic scoring mechanism, and prediction algorithm, should also be cloud-based. Since traditional cloud file systems like MapReduce/Hadoop exhibit cost disadvantages in data accessing, a Cache A Replica On Modification (CAROM) cloud file system is designed to achieve high data availability and low storage cost, which provides resiliency in cloud file systems with high efficiency. In this paper, we focus on moving data-intensive applications to the CAROM cloud file system, for optimizing access latencies while maintaining the benefit of low storage cost. To achieve this, we propose a novel scheduling mechanism as a lubricant between CAROM and data-intensive applications. Our scheme consists of three parts. First, tripartite graph is employed to describe the relationships among tasks, computation nodes, and data nodes. Second, we give a 1:1:1 framework based on the situation that the data of task have been stored in the cache, and introduce two variant frameworks 1:1:M and 1:N:M with the consideration of limitations of cache size and performance of task. Finally, a ${k}$ -list algorithm is proposed as an approximation algorithm, and its mathematical definitions and proofs are given in detail. We conduct simulations to evaluate our scheme and the results prove that the performance of our algorithm is significantly better than that of the general two-layer algorithm.

Full Text