NIOT: A Novel Inference Optimization of Transformers on Modern CPUs

Zining Zhang,Zhenjie Zhang,Yao Chen,Bingsheng He

doi:10.1109/tpds.2023.3269530

Abstract

In the machine learning era, model inference efficiency is one of the most important issues for machine learning systems. It is a major challenge to find the optimal configuration in a huge search space as the combinations of kernel fusion, memory tiling, and thread allocation strategies result in highly variable and unpredictable inference performance. The problem is particularly pronounced in models with large parameter matrices such as Transformers. In this paper, we aim to develop a general and powerful framework for inference optimization, called NIOT, to achieve desirable efficiency for the prevailing Transformer-like models on CPUs. To take full advantage of modern CPU features such as SIMD and cache hierarchy, NIOT employs various methods to provide promising strategies tailored to the target Transformer model. Our C++ implementation of NIOT shows significant performance improvements over popular well-optimized model-serving runtimes such as PyTorch and ONNXRuntime.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NIOT: A Novel Inference Optimization of Transformers on Modern CPUs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Jun 1, 2023
Citations: 2

Similar Papers

A general real-time optimization framework for polynomial-based trajectory planning of autonomous flying robots
Yunes Sh Alqudsi ... Gamal El-Bayoumi
Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering | VOL. 237
Yunes Sh Alqudsi, et. al.Yunes Sh Alqudsi ... Gamal El-Bayoumi
03 May 2022
Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering | VOL. 237

A general approximation framework for direct optimization of information retrieval measures
Tao Qin ... Hang Li
Information Retrieval | VOL. 13
Tao Qin, et. al.Tao Qin ... Hang Li
30 Dec 2009
Information Retrieval | VOL. 13

General Meta-Model Framework for Surrogate-Based Numerical Optimization
Žiga Lukšič ... Ljupčo Todorovski
-
Žiga Lukšič, et. al.Žiga Lukšič ... Ljupčo Todorovski
01 Jan 2017
01 Jan 2017

A general optimization framework for the design and planning of energy supply chain networks: Techno-economic and environmental analysis
Nur I Zulkafli ... Georgios M Kopanos
Chemical Engineering Research and Design | VOL. 131
Nur I Zulkafli, et. al.Nur I Zulkafli ... Georgios M Kopanos
06 Dec 2017
Chemical Engineering Research and Design | VOL. 131

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NIOT: A Novel Inference Optimization of Transformers on Modern CPUs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems