Nn-METER

Li Lyna Zhang,Jianyu Wei,Ningxin Zheng,Ting Cao,Shihao Han,Yunxin Liu

doi:10.1145/3529706.3529712

Abstract

Inference latency has become a crucial metric in running Deep Neural Network (DNN) models on various mobile and edge devices. To this end, latency prediction of DNN inference is highly desirable for many tasks where measuring the latency on real devices is infeasible or too costly. Yet it is very challenging and existing approaches fail to achieve a high accuracy of prediction, due to the varying model-inference latency caused by the runtime optimizations on diverse edge devices. In this paper, we propose and develop nn-Meter, a novel and efficient system to accurately predict the DNN inference latency on diverse edge devices. The key idea of nn-Meter is dividing a whole model inference into kernels, i.e., the execution units on a device, and conducting kernel-level prediction. nn-Meter builds atop two key techniques: (i) kernel detection to automatically detect the execution unit of model inference via a set of well-designed test cases; and (ii) adaptive sampling to efficiently sample the most beneficial configurations from a large space to build accurate kernel-level latency predictors. nn-Meter achieves significant high prediction accuracy on four types of edge devices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Nn-METER

Abstract

Talk to us

Similar Papers

More From: GetMobile: Mobile Computing and Communications

Lead the way for us

Journal: GetMobile: Mobile Computing and Communications	Publication Date: Mar 30, 2022
Citations: 1

Similar Papers

Nn-Meter
Li Lyna Zhang ... Yunxin Liu
-
Li Lyna Zhang, et. al.Li Lyna Zhang ... Yunxin Liu
24 Jun 2021
24 Jun 2021

PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End Latency
Soumendu Kumar Ghosh ... Anand Raghunathan
ACM Transactions on Embedded Computing Systems | VOL. 23
Soumendu Kumar Ghosh, et. al.Soumendu Kumar Ghosh ... Anand Raghunathan
10 Jan 2024
ACM Transactions on Embedded Computing Systems | VOL. 23

A Co-Scheduling Framework for DNN Models on Mobile and Edge Devices with Heterogeneous Hardware
Zhiyuan Xu ... Jian Tang
IEEE Transactions on Mobile Computing | VOL. -
Zhiyuan Xu, et. al.Zhiyuan Xu ... Jian Tang
01 Jan 2020
IEEE Transactions on Mobile Computing | VOL. -

Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
Jing Li ... Weifa Liang
IEEE Transactions on Mobile Computing | VOL. 22
Jing Li, et. al.Jing Li ... Weifa Liang
01 May 2023
IEEE Transactions on Mobile Computing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nn-METER

Abstract

Talk to us

Similar Papers

More From: GetMobile: Mobile Computing and Communications