Hadoop is an open source from Apache with a distributed file system and MapReduce distributed computing framework. The current Apache 2.0 license agreement supports on-demand payment by consumers for cloud platform services, helping users leverage their respective different hardware to provides cloud services. In cloud-based environment, there is a need to balance the resource requirements of workloads, optimize load performance, and the cloud compute costs to manage. When the processing power of clustered machines varies widely, such as when hardware is aging or overloaded, Hadoop offers a speculative execution (SE) optimization strategy, by monitoring task progress in real time, in the starting identical backup tasks on different nodes when multiple tasks under a job are not running at the same speed, providing the first to go. The completed calculations maintain the overall progress of the job. At present, the SE strategy’s incorrect selection of backup nodes and resource constraints may result in poor Hadoop performance, and subsequent tasks cannot be completed execution and other problems. This paper proposes an SE optimization strategy based on near data prediction, which analyzes the prediction of real-time task execution information to predict the required running time, select backup nodes based on actual requirements and approximate data to make the SE strategy achieve the best performance. Experiments prove that in a heterogeneous Hadoop environment, the optimization strategy can effectively improve the effectiveness and accuracy of various tasks and enhance the performance of cloud computing. Platform performance can benefits consumers better than before.
Read full abstract