Abstract

This paper identifies four common misconceptions about the scalability of volunteer computing on big data problems. The misconceptions are then clarified by analyzing the relationship between scalability and the impact factors including the problem size of big data, the heterogeneity and dynamics of volunteers, and the overlay structure. This paper proposes optimization strategies to find the optimal overlay for the given big data problem. This paper forms multiple overlays to optimize the performance of individual steps in terms of MapReduce paradigm. The optimization is to achieve the maximum overall performance by using a minimum number of volunteers, not overusing resources. This paper has demonstrated that the simulations on the concerned factors can fast find the optimization points. This paper concludes that always welcoming more volunteers is an overuse of available resources because they do not always bring benefit to the overall performance. Finding optimal use of volunteers are possible for the given big data problems even on the dynamics and opportunism of volunteers.

Highlights

  • Data that are either of scientific projects to produce answers to scientific hypotheses or generated by business transactions or social events have become a huge size

  • The existing work that is related to this paper has focused on two aspects: the impact of dynamics and opportunism on big data performance and various optimization approaches to coping with the uncertainties

  • The last step is to use the output of the optimization from the previous four steps, including the minimum overlay size (MIOS), the maximum number of map overlays (MXMO), the minimum reduction in overlay size (MROS) and the selected individual volunteers, to construct an optimal overlay, achieving the maximum speedup against the dynamics of volunteers using the minimum number of volunteers

Read more

Summary

Introduction

Data that are either of scientific projects to produce answers to scientific hypotheses or generated by business transactions or social events have become a huge size. The reality is that small or medium business or scientific projects are unable to invest such a data center What they can make use are existing commodity computers in the organization. To make use of these computing facilities, the corporate desktops or laptops are unreliable and are not dedicated for a single task When this situation is extended to the Internet scale, the donated compute cycles are even dynamic and opportunistic. The previous work has confirmed the scalability of volunteer computing for big data processing under the impact factors. For a given big data problem, the integrated platform and algorithm support our research methodology, which consists of the following investigations on how scalability goes and converges vs.: 1. 2. Confirmation of scalability of volunteer computing for big data processing goes logarithm-like scale in terms of speedup and in reciprocal inverse-like scale in terms of speedup growth rate.

Related Work
MapReduce Workflow in Dynamic and Opportunistic Environments
The Measurement of Performance
The Setting of Dynamics and Workload
Misconception 4
Case Study
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.