A hardware-independent time estimation method for inference process of convolutional layers on GPU

Chengzhen Meng,Hongjun Dai

doi:10.1016/j.peva.2023.102368

Abstract

Nowadays, various AI applications based on Convolutional Neural Networks (CNNs) are widely deployed on GPU-accelerated devices. However, due to the lack of visibility into GPU internal scheduling, accurately modeling the performance of CNN inference tasks or estimating the latency of CNN tasks that are executing or waiting on the GPU is challenging. This hurts the multi-model scheduling on multi-device and CNN real-time inference. Therefore, in this paper, we propose a time estimation method to estimate the forward execution time of a convolutional layer with an arbitrary shape on a GPU. The proposed method divides an explicit General Matrix Multiplication (GEMM) convolution operation into a series of estimatable GPU operations and constructs performance models at the level of sub-operations rather than hardware instructions or entire models. Also, the proposed method can be easily adapted to different hardware devices or underlying algorithm implementations, since it focuses on the variation of execution time relative to the input data scale, without focusing on specific instructions or hardware actions. According to the experiments on four typical CUDA compatible platforms, the proposed method has an average error rate of less than 5% for convolutional layers in some practical CNN models, and has about 8% error rate in estimating GEMM convolution implementations provided by cuDNN library. Experiments show that the proposed method can predict the forward execution time of convolutional layers of arbitrary size in CNN inference tasks on different GPU models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A hardware-independent time estimation method for inference process of convolutional layers on GPU

Abstract

Talk to us

Similar Papers

More From: Performance Evaluation

Lead the way for us

Similar Papers

Differential Image-based Fast and Compatible Convolutional Layers for Multi-core Processors
Sunghoon Hong ... Daejin Park
-
Sunghoon Hong, et. al.Sunghoon Hong ... Daejin Park
20 Feb 2023
20 Feb 2023

Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure
Hweesoo Kim ... Jaewan Choi
IEEE Computer Architecture Letters | VOL. 20
Hweesoo Kim, et. al.Hweesoo Kim ... Jaewan Choi
01 Jan 2020
IEEE Computer Architecture Letters | VOL. 20

A Gas Classification Algorithm of Electronic Noses Based on Convolutional Spiking Neural Network
Yizhou Xiong ... Hao Wan
Electrochemical Society Meeting Abstracts | VOL. MA2021-01
Yizhou Xiong, et. al.Yizhou Xiong ... Hao Wan
30 May 2021
Electrochemical Society Meeting Abstracts | VOL. MA2021-01

A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
Chao Sun ... Junhong Lu
Scientific Reports | VOL. 11
Chao Sun, et. al.Chao Sun ... Junhong Lu
14 Jan 2021
Scientific Reports | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hardware-independent time estimation method for inference process of convolutional layers on GPU

Abstract

Talk to us

Similar Papers

More From: Performance Evaluation