Abstract

The recent data deluge needing to be processed represents one of the major challenges in computational field. Available high-performance computing (HPC) systems can be very useful for solving this problem when data can be divided in chunks that can be processed in parallel. However, due to intrinsic characteristics of data-intensive problems, these applications can present huge load imbalances, and it can be difficult to efficiently use the available resources. This work proposes a strategy for dynamically analyzing and tuning the partition factor used to generate the data chunks. With the aim of decreasing the load imbalance and therefore the overall execution time, this strategy divides the data chunks with the biggest computation times and gathers contiguous chunks with the smallest computation times. The criteria to divide or join chunks are based on the chunks' associated execution time (average and standard deviation) and the number of processing nodes being used. We have evaluated our strategy by using simulation, and a real data-intensive application. Applying our strategy, we have obtained promising results since we have improved up to 55% the total execution time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.