Abstract

In deep learning, one of the popular subclasses of multilayer perceptron is the Convolutional Neural Networks (CNNs), optimized for image and semantic segmentation tasks. With a large number of floating-point operations and relatively less data transfer in the training phase, CNNs are well suited to be handled by parallel architectures. The high accuracy of CNNs comes at the cost of consequential compute and memory demands. In this article, we present a comparative analysis of a pre-trained CNN framework for recognizing a handwritten digit on different processing platforms. The performance of the neural network as the function of the parallel processing platforms offers parametric insights and factors that influence the inference on state-of-the-art Graphics Processing Units (GPUs) and systolic arrays such as Eyeriss and Tensor Processing Unit (TPU). Through inference time analysis, we observed that the systolic arrays can outperform upto 58.7 times better than a Turing architecture powered GPU. We show that while the efficiency with which the available resources are utilized is higher in the existing GPUs (upto 32%) than the TPUs, the efficiency of application-specific systolic arrays can be on par with that of GPUs. We present the results obtained from three different customized systolic array-based platforms which can be adopted by the designers to decide the hardware optimization goal.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.