Abstract
With increasing demand for cloud computing technology, cloud infrastructures are utilized to their maximum limits. There is a high possibility that commodity servers that are used in Hadoop Distributed File System (HDFS) based cloud data center will fail often. However, the selection of source and destination data nodes for re-replication of data has so far not been adequately addressed. In order to balance the workload among nodes during re-replication phase and reduce impact on cluster normal jobs’ performance, we develop a re-replication scheme that takes into consideration of both performance and reliability perspectives. The appropriate nodes for re-replication are selected based on Analytic Hierarchy Process (AHP) with the consideration of the current utilization of resources by the cluster normal jobs. Toward effective data re-replication, we investigate the feasibility of using linear regression and local regression methods to predict resource utilization. Simulation results show that our proposed approach can reduce re-replication time, total job execution time and top-of-rack network traffic compared to baseline HDFS, consequently increases the reliability of the system and reduces performance impacts on users jobs. Regarding feasibility study of prediction methods, both regression methods are good enough to predict short time future resource utilization for re-replication.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.