Abstract
This paper identifies four common misconceptions about the scalability of volunteer computing on big data problems. The misconceptions are then clarified by analyzing the relationship between scalability and the impact factors including the problem size of big data, the heterogeneity and dynamics of volunteers, and the overlay structure. This paper proposes optimization strategies to find the optimal overlay for the given big data problem. This paper forms multiple overlays to optimize the performance of individual steps in terms of MapReduce paradigm. The optimization is to achieve the maximum overall performance by using a minimum number of volunteers, not overusing resources. This paper has demonstrated that the simulations on the concerned factors can fast find the optimization points. This paper concludes that always welcoming more volunteers is an overuse of available resources because they do not always bring benefit to the overall performance. Finding optimal use of volunteers are possible for the given big data problems even on the dynamics and opportunism of volunteers.
Highlights
Data that are either of scientific projects to produce answers to scientific hypotheses or generated by business transactions or social events have become a huge size
The existing work that is related to this paper has focused on two aspects: the impact of dynamics and opportunism on big data performance and various optimization approaches to coping with the uncertainties
The last step is to use the output of the optimization from the previous four steps, including the minimum overlay size (MIOS), the maximum number of map overlays (MXMO), the minimum reduction in overlay size (MROS) and the selected individual volunteers, to construct an optimal overlay, achieving the maximum speedup against the dynamics of volunteers using the minimum number of volunteers
Summary
Data that are either of scientific projects to produce answers to scientific hypotheses or generated by business transactions or social events have become a huge size. The reality is that small or medium business or scientific projects are unable to invest such a data center What they can make use are existing commodity computers in the organization. To make use of these computing facilities, the corporate desktops or laptops are unreliable and are not dedicated for a single task When this situation is extended to the Internet scale, the donated compute cycles are even dynamic and opportunistic. The previous work has confirmed the scalability of volunteer computing for big data processing under the impact factors. For a given big data problem, the integrated platform and algorithm support our research methodology, which consists of the following investigations on how scalability goes and converges vs.: 1. 2. Confirmation of scalability of volunteer computing for big data processing goes logarithm-like scale in terms of speedup and in reciprocal inverse-like scale in terms of speedup growth rate.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.