Abstract

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of latency, the research community uses lookup tables of all possible layers for the calculation of the inference on a mobile CPU. It requires only a small number of experiments. Unfortunately, on a mobile GPU, this method is not applicable in a straightforward way and shows low precision. In this work, we consider latency approximation on a mobile GPU as a data- and hardware-specific problem. Our main goal is to construct a convenient Latency Estimation Tool for Investigation (LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we make tools that provide a convenient way to conduct massive experiments on different target devices focusing on a mobile GPU. After evaluation of the dataset, one can train the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.

Highlights

  • Algorithms based on convolutional neural networks can achieve high performance in numerous computer vision tasks, such as image recognition [1,2], object detection, segmentation [3], and many other areas [4]

  • We show the construction and analysis of the dataset we constructed for two mobile devices and the subset of neural architecture search area (NAS)-Bench 101 search space

  • We focus on a mobile GPU

Read more

Summary

Introduction

Algorithms based on convolutional neural networks can achieve high performance in numerous computer vision tasks, such as image recognition [1,2], object detection, segmentation [3], and many other areas [4]. A lot of applications require computer vision problems to be solved in real-time at the end devices, such as mobile phones, embedded devices, car computers, etc. All those devices have their architecture, hardware, and software. Fast and accurate ShuffleNet [5] achieved actual speedup at Qualcomm Snapdragon 820 processor is more than 1.5× less than theoretical in comparison with MobileNet [6] It is a quite widespread phenomenon; more examples can be found on TensorFlow [7] Lite (TFLite) benchmark comparison [8]. More results of TensorFlow Lite performance benchmarks when running well-known models on some Android and iOS devices can be found on https://www.tensorflow.org/lite/performance/benchmarks (accessed on 19 August 2021)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call