Microarchitectural Characterization on a Mobile Workload

Woohyong Lee,R Young Chul Kim,Jiyoung Lee,Bo Kyung Park

doi:10.3390/app11031225

Woohyong Lee, R Young Chul Kim + Show 2 more

Open Access

https://doi.org/10.3390/app11031225

Copy DOI

Journal: Applied sciences	Publication Date: Jan 29, 2021
Citations: 1	License type: CC BY 4.0

Affiliation: Samsung (South Korea), Hongik University

Abstract

Geekbench is one of the most referenced cross-platform benchmarks in the mobile world. Most of its workloads are synthetic but some of them aim to simulate real-world behavior. In the mobile world, its microarchitectural behavior has been reported rarely since the hardware profiling features are limited to the public. As a popular mobile performance workload, it is hard to find Geekbench’s microarchitecture characteristics in mobile devices. In this paper, a thorough experimental study of Geekbench performance characterization is reported with detailed performance metrics. This study also identifies mobile system on chip (SoC) microarchitecture impacts, such as the cache subsystem, instruction-level parallelism, and branch performance. After the study, we could understand the bottleneck of workloads, especially in the cache sub-system. This means that the change of data set size directly impacts performance score significantly in some systems and will ruin the fairness of the CPU benchmark. In the experiment, Samsung’s Exynos9820-based platform was used as the tested device with Android Native Development Kit (NDK) built binaries. The Exynos9820 is a superscalar processor capable of dual issuing some instructions. To help performance analysis, we enable the capability to collect performance events with performance monitoring unit (PMU) registers. The PMU is a set of hardware performance counters which are built into microprocessors to store the counts of hardware-related activities. Throughout the experiment, functional and microarchitectural performance profiles were fully studied. This paper describes the details of the mobile performance studies above. In our experiment, the ARM DS5 tool was used for collecting runtime PMU profiles including OS-level performance data. After the comparative study is completed, users will understand more about the mobile architecture behavior, and this will help to evaluate which benchmark is preferable for fair performance comparison.

Highlights

Analysis of workload execution and identification of software and hardware performance barriers provide critical engineering benefit; these include guidance on software optimization, hardware design tradeoffs, configuration tuning, and comparative assessments for platforms
We describe the microarchitecture performance analysis of Primate Lab’s mobile performance evaluation workload, called Geekbench
This means that the change of data set size directly impacts performance score significantly in some systems and will ruin the fairness of the CPU benchmark

Summary

Introduction

Analysis of workload execution and identification of software and hardware performance barriers provide critical engineering benefit; these include guidance on software optimization, hardware design tradeoffs, configuration tuning, and comparative assessments for platforms. We describe the microarchitecture performance analysis of Primate Lab’s mobile performance evaluation workload, called Geekbench. This analysis can explain the weaknesses of the experimental workload. It can achieve the maximum performance on certain devices with only bigger cache sizes which are not needed in most real-world cases. With the collected PMU data, the explained characteristics include instructions per cycle (IPC), cache-related counters, branch prediction-related counters, translation look aside buffer (TLB)-related counters, and other performance-related counters. All these counters are strongly related to runtime performance.

Motivation

Tested Device

Mobile Device Workloads

Experiment

Results

Instruction Mix

Cache Performance

Branch

Other Performance Metrics

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Microarchitectural Characterization on a Mobile Workload

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Beginning Android C++ Game Development
Bruce Sutherland
-
Bruce SutherlandBruce Sutherland
01 Jan 2013
01 Jan 2013

Performance monitor unit design for an AXI-based multi-core SoC platform
Hyun-Min Kyung ... Jong Wook Kwak
-
Hyun-Min Kyung, et. al.Hyun-Min Kyung ... Jong Wook Kwak
11 Mar 2007
11 Mar 2007

Using the Android Development Environment
Bruce Sutherland
-
Bruce SutherlandBruce Sutherland
01 Jan 2013
01 Jan 2013

Going Native with the NDK
Mario Zechner ... Robert Green
-
Mario Zechner, et. al.Mario Zechner ... Robert Green
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Microarchitectural Characterization on a Mobile Workload

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences