Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement.

Xuzhen He

doi:10.1371/journal.pone.0282265

Abstract

The recent dramatic progress in machine learning is partially attributed to the availability of high-performant computers and development tools. The accelerated linear algebra (XLA) compiler is one such tool that automatically optimises array operations (mostly fusion to reduce memory operations) and compiles the optimised operations into high-performant programs specific to target computing platforms. Like machine-learning models, numerical models are often expressed in array operations, and thus their performance can be boosted by XLA. This study is the first of its kind to examine the efficiency of XLA for numerical models, and the efficiency is examined stringently by comparing its performance with that of optimal implementations. Two shared-memory computing platforms are examined-the CPU platform and the GPU platform. To obtain optimal implementations, the computing speed and its optimisation are rigorously studied by considering different workloads and the corresponding computer performance. Two simple equations are found to faithfully modell the computing speed of numerical models with very few easily-measureable parameters. Regarding operation optimisation within XLA, results show that models expressed in low-level operations (e.g., slice, concatenation, and arithmetic operations) are successfully fused while high-level operations (e.g., convolution and roll) are not. Regarding compilation within XLA, results show that for the CPU platform of certain computers and certain simple numerical models on the GPU platform, XLA achieves high efficiency (> 80%) for large problems and acceptable efficiency (10%~80%) for medium-size problems-the gap is from the overhead cost of Python. Unsatisfactory performance is found for the CPU platform of other computers (operations are compiled in a non-optimal way) and for high-dimensional complex models for the GPU platform, where each GPU thread in XLA handles 4 (single precision) or 2 (double precision) output elements-hoping to exploit the high-performant instructions that can read/write 4 or 2 floating-point numbers with one instruction. However, these instructions are rarely used in the generated code for complex models and performance is negatively affected. Therefore, flags should be added to control the compilation for these non-optimal scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Feb 24, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement.

Abstract

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Diagnosing Coronavirus (COVID-19) Using Various Deep Learning Models: A Comparative Study
Omran Al-Shamma ... Laith Farhan
-
Omran Al-Shamma, et. al.Omran Al-Shamma ... Laith Farhan
01 Jan 2020
01 Jan 2020

Speedup and Password Recovery for Encrypted WinRAR3 without Encrypting Filename on GPUs
Qingbing Ji ... Hao Yin
Journal of Physics: Conference Series | VOL. 1673
Qingbing Ji, et. al.Qingbing Ji ... Hao Yin
01 Nov 2020
Journal of Physics: Conference Series | VOL. 1673

Memory-based side-channel attacks and countermeasures
Zhen Hang Jiang
-
Zhen Hang JiangZhen Hang Jiang
10 May 2021
10 May 2021

Accelerate High Resolution Image Pedestrian Detection With Non-Pedestrian Area Estimation
Haodi Zhang ... Danut Ovidiu Pop
IEEE Access | VOL. 9
Haodi Zhang, et. al.Haodi Zhang ... Danut Ovidiu Pop
01 Jan 2020
IEEE Access | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement.

Abstract

Talk to us

Similar Papers

More From: PloS one