Improving the hybrid cloud performance through disk activity-aware data access

Tsozen Yeh,Yulin Chen

doi:10.1016/j.simpat.2021.102296

Abstract

Cloud computing has been making significant contributions to Internet of Things, Big Data, and many other cutting-edge research areas in recent years. To deal with the cloud bursting, on-premises private clouds often extend their service capacity with off-premises public clouds, which causes the migration of jobs and their corresponding data from private clouds to public clouds. For jobs executed in public clouds, promptly transferring data they need from private clouds to public clouds is essential for their quick completion as the volume of data is often large for cloud applications. The Internet connection between private clouds and public clouds is with limited bandwidth in most cases. Therefore, it will be valuable if the underlying operating system could expedite the course of reading data from hard drives to speed up the process of moving data from private clouds to public clouds. The Apache Hadoop is considered as one of the most widely used cloud platforms in the community of cloud computing. It keeps multiple replicas of data across its cluster nodes to improve data availability. We designed and implemented a new model enabling computing nodes in Hadoop to get data in request from a cluster node with the least amount of disk activity regardless of its location to hasten the course of accessing data. Experimental results show that jobs could reduce their execution time by up to 80.83% in our model. Accordingly, our model could help accelerate the completion of job execution in both private clouds and public clouds in the environment of hybrid clouds.

Full Text