Abstract

Graphics accelerators are increasingly used for general purpose high performance computing applications as they provide a low cost solution to high performance computing requirements. However, the existing application software needs to be restructured to suit to the accelerator paradigm. Explicit methods are inherently suitable for parallelization whereas implicit methods are not suitable as the nodes need to be processed in specific order. However, the nodes can be grouped into clusters such that nodes within the cluster are independent and can be processed concurrently. CUDA kernel can be launched separately for each cluster of nodes to process nodes in parallel leading to same computation results as sequential program. One such successful attempt has been made and the speed up obtained along with computation results is presented in this paper.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.