Abstract
The main contribution of this paper is to introduce two parallel memory machines, the discrete memory machine (DMM) and the unified memory machine (UMM). Unlike well-studied theoretical parallel computational models such as parallel random access machines, these parallel memory machines are practical and capture the essential feature of the memory access by graphical processing units (GPUs). As a first step of the development of algorithmic techniques on the DMM and the UMM, we first evaluate the computing time for the contiguous access and the stride access to the memory on these models. We then present parallel algorithms to transpose a 2D array on these models and evaluate their performance. Finally, we show that, for any permutation given in offline, data in an array can be moved efficiently along the given permutation both on the DMM and on the UMM. Since the computing time of our permutation algorithms on the DMM and the UMM is equal to the sum of the lower bounds obtained from the memory bandwidth limitation and the latency limitation, they are optimal from the theoretical point of view. We believe that the DMM and the UMM can be good theoretical platforms to develop algorithmic techniques for GPUs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Parallel, Emergent and Distributed Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.