Abstract

Abstract In recent years, the parallel computing performance on GPUs (graphics processing units) has grown rapidly. As a result, GPUs have been widely applied in computationally intensive applications such as image processing, deep learning, and artificial intelligence. Because these applications can be modeled by multiple GPU kernels, some of which might even be dependent, it is essential to identify an efficient method for scheduling dependent kernels on GPU cores. Simply observing kernel dependencies by executing them in sequence results in performance degradation. Furthermore, dependent kernels generally need to share data. Consequently, without properly scheduling dependent kernels, unnecessary memory accesses and copies will be generated. Neural network model environments include many operators that are suitable for both parallel and dependent kernels. This paper proposes an efficient and clear method for creating a framework to analyze the ONNX (open neural network exchange) model and find the pattern of dependent kernels, which can then be used for scheduling with the GPU architecture. The preliminary experimental results show that this technique improves the overall performance by 8% and reduces the cache miss rate by 14% on average by combining neural network operators and appropriate memory policies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.