Abstract
Nowadays, companies are faced with the task of processing huge quantum of data. As the traditional database systems cannot handle this task in a cost-efficient manner, companies have built customized data processing frameworks. Cloud computing has emerged as a promising approach to rent a large IT infrastructure on a short-term pay-per-usage basis. This paper attempts to schedule tasks on compute nodes so that data sent from one node to the other has to traverse as few network switches as possible. The challenges and opportunities for efficient parallel data processing in cloud environments have been demonstrated and Nephele, the first data processing framework, has been presented to exploit the dynamic resource provisioning offered by the IaaS clouds. The overall utilisation of resources has been improved by assigning specific virtual machine types to specific tasks of a processing job and by automatically allocating or deallocating virtual machines in the course of a job execution. This has led to substantial reduction in the cost of parallel data processing.
Highlights
Today many companies are processing huge amounts of data in a cost-efficient manner
The task is to exploit the dynamic resource provisioning offered by the IaaS clouds in order to achieve efficient parallel data processing in cloud environments
A server machine is a high-performance host that is running one or more server programs which shares its resources with clients
Summary
Today many companies are processing huge amounts of data in a cost-efficient manner. Classic examples are operators of Internet search engines, like Google, Yahoo, or Microsoft. The vast amount of data they have to continuously deal with has made traditional database solutions prohibitively expensive Instead, these companies have popularized an architectural paradigm based on a large number of commodity servers. Problems are split into several independent subtasks, distributed among the available nodes, and computed in parallel. Many of these companies have built customized data processing frameworks. The cloud’s virtualized nature helps to enable promising new use cases for efficient parallel data processing. The overall throughput of the network can be improved by minimising bottlenecks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.