Abstract
Image processing applications employ various filters for several purposes, such as enhancing the images and extracting the features. Recent studies show that filters in image processing applications take a substantial amount of the execution time, and it is crucial to boost their performance to improve the overall performance of the image processing applications. Image processing filters require a significant amount of data sharing among threads which are in charge of filtering neighbor pixels. Graphics Processing Units (GPUs) attempt to satisfy the demand of data sharing by providing the scratch-pad memory, shuffle instructions, and on-chip caches. However, we observe that these mechanisms are insufficient to provide a fast and energy-efficient neighbor data sharing for the image processing filters. In this paper, we propose a new hardware/software co-design mechanism for GPUs, to effectively provide a fast and energy-efficient register-level neighbor data sharing for the image filters. We propose a neighbor data exchange mechanism, called Neda , that adds a register to each streaming processor (SP) which can be accessed by its neighboring SPs. Our experimental results show that Neda improves the performance and energy consumption by 12.4 and 13.5 percent, on average, respectively, compared to the NVIDIA SDK implementation of image processing filters. Moreover, Neda 's performance is within 9.3 percent of the ideal GPU with zero latency neighbor data exchange capability.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.