Abstract
Different communication-aware mapping techniques were proposed in recent years for improving the performance of distributed systems based on both, off-chip and on-chip networks. Some of these proposals were based on heuristic search for finding pseudo-optimal assignments of tasks and processing elements. However, the technology integration improvements have allowed a significant increase in the number of network nodes, requiring the acceleration of the heuristic search. In this paper, we propose a comparative study of the local search method used in a communication-aware mapping technique, when implemented on different parallel architectures. We compare the performance provided by a version of the local search method when executed on a single Graphics Processing Unit (GPU) with the one provided by the MPI version executed on a supercomputer with the same theoretical performance of the GPU platform, in order to study a fair scenario. We have considered a GPU based on the Fermi architecture, evaluating the improvements achieved by some new architectural features of this platform. The results show that a mixed parallel implementation on a single GPU outperforms the MPI implementation of the local search method. These results validate the GPU implementation as a very cost-effective accelerator for the local search method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.