Abstract

The current IT technologies have a strong need for scaling up the high-performance analysis to large-scale datasets. Tremendously increased over the last few years volume and complexity of data gathered in both public (such as on the web) and enterprise (e.g. digitalized internal document base) domains have posed new challenges to providers of high performance computing (HPC) infrastructures, which is recognised in the community as Big Data problem. On contrast to the typical HPC applications, the Big Data ones are not oriented on reaching the peak performance of the infrastructure and thus offer more opportunities for the capacity infrastructure model rather than for the capability one, making the use of Cloud infrastructures preferable over the HPC. However, considering the more and more vanishing difference between these two infrastructure types, i.e. Cloud and HPC, it makes a lot of sense to investigate the abilities of traditional HPC infrastructure to execute Big Data applications as well, despite their relatively poor efficiency as compared with the traditional, very optimized HPC ones. This paper discusses the main state-of-the-art parallelisation techniques utilised in both Cloud and HPC domains and evaluates them on an exemplary text processing application on a testbed HPC cluster.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.