Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Evgeny Ponomarev,Ivan Oseledets,Valery Glukhov,Sergey Matveev

doi:10.3390/computers10080104

Abstract

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of latency, the research community uses lookup tables of all possible layers for the calculation of the inference on a mobile CPU. It requires only a small number of experiments. Unfortunately, on a mobile GPU, this method is not applicable in a straightforward way and shows low precision. In this work, we consider latency approximation on a mobile GPU as a data- and hardware-specific problem. Our main goal is to construct a convenient Latency Estimation Tool for Investigation (LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we make tools that provide a convenient way to conduct massive experiments on different target devices focusing on a mobile GPU. After evaluation of the dataset, one can train the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.

Highlights

Algorithms based on convolutional neural networks can achieve high performance in numerous computer vision tasks, such as image recognition [1,2], object detection, segmentation [3], and many other areas [4]
We show the construction and analysis of the dataset we constructed for two mobile devices and the subset of neural architecture search area (NAS)-Bench 101 search space
We focus on a mobile GPU

Summary

Introduction

Algorithms based on convolutional neural networks can achieve high performance in numerous computer vision tasks, such as image recognition [1,2], object detection, segmentation [3], and many other areas [4]. A lot of applications require computer vision problems to be solved in real-time at the end devices, such as mobile phones, embedded devices, car computers, etc. All those devices have their architecture, hardware, and software. Fast and accurate ShuffleNet [5] achieved actual speedup at Qualcomm Snapdragon 820 processor is more than 1.5× less than theoretical in comparison with MobileNet [6] It is a quite widespread phenomenon; more examples can be found on TensorFlow [7] Lite (TFLite) benchmark comparison [8]. More results of TensorFlow Lite performance benchmarks when running well-known models on some Android and iOS devices can be found on https://www.tensorflow.org/lite/performance/benchmarks (accessed on 19 August 2021)

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers	Publication Date: Aug 23, 2021
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers

Lead the way for us

Similar Papers

Nn-Meter
Li Lyna Zhang ... Yunxin Liu
-
Li Lyna Zhang, et. al.Li Lyna Zhang ... Yunxin Liu
24 Jun 2021
24 Jun 2021

Profiling and optimizing deep learning inference on mobile GPUs
Shiqi Jiang ... Lihao Ran
-
Shiqi Jiang, et. al.Shiqi Jiang ... Lihao Ran
24 Aug 2020
24 Aug 2020

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
Wei Niu ... Sijia Liu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Wei Niu, et. al.Wei Niu ... Sijia Liu
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Enabling Real-time AI Inference on Mobile Devices via GPU-CPU Collaborative Execution
Hao Li ... Joseph K Ng
-
Hao Li, et. al.Hao Li ... Joseph K Ng
01 Aug 2022
01 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers