EMS-i : An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

Yitu Wang,Shiyu Li,Hai Li,Yiran Chen,Qilin Zheng,Andrew Chang

doi:10.1145/3609384

Abstract

Recommendation systems have been widely embedded into many Internet services. For example, Meta’s deep learning recommendation model (DLRM) shows high prefictive accuracy of click-through rate in processing large-scale embedding tables. The SparseLengthSum (SLS) kernel of the DLRM dominates the inference time of the DLRM due to intensive irregular memory accesses to the embedding vectors. Some prior works directly adopt near data processing (NDP) solutions to obtain higher memory bandwidth to accelerate SLS. However, their inferior memory hierarchy induces low performance-cost ratio and fails to fully exploit the data locality. Although some software-managed cache policies were proposed to improve the cache hit rate, the incurred cache miss penalty is unacceptable considering the high overheads of executing the corresponding programs and the communication between the host and the accelerator. To address the issues aforementioned, we propose EMS-i , an efficient memory system design that integrates Solide State Drive (SSD) into the memory hierarchy using Compute Express Link (CXL) for recommendation system inference. We specialize the caching mechanism according to the characteristics of various DLRM workloads and propose a novel prefetching mechanism to further improve the performance. In addition, we delicately design the inference kernel and develop a customized mapping scheme for SLS operation, considering the multi-level parallelism in SLS and the data locality within a batch of queries. Compared to the state-of-the-art NDP solutions, EMS-i achieves up to 10.9× speedup over RecSSD and the performance comparable to RecNMP with 72% energy savings. EMS-i also saves up to 8.7× and 6.6 × memory cost w.r.t. RecSSD and RecNMP, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EMS-i : An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Embedded Computing Systems

Lead the way for us

Journal: ACM Transactions on Embedded Computing Systems	Publication Date: Sep 9, 2023
Citations: 2

Similar Papers

Nearest data processing in GPU
Hossein Bitalebi ... Masoumeh Ebrahimi
Sustainable Computing: Informatics and Systems | VOL. 44
Hossein Bitalebi, et. al.Hossein Bitalebi ... Masoumeh Ebrahimi
28 Oct 2024
Sustainable Computing: Informatics and Systems | VOL. 44

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction
Bahar Asgari ... Sung-Kyu Lim
-
Bahar Asgari, et. al.Bahar Asgari ... Sung-Kyu Lim
01 Feb 2021
01 Feb 2021

RNA: Reconfigurable LSTM Accelerator with Near Data Approximate Processing
Yu Gong ... Wei Ge
-
Yu Gong, et. al.Yu Gong ... Wei Ge
01 Dec 2019
01 Dec 2019

Enabling near-data processing in distributed object storage systems
Ian F. Adams ... Michael P. Mesnier
-
Ian F. Adams, et. al.Ian F. Adams ... Michael P. Mesnier
27 Jul 2021
27 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EMS-i : An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Embedded Computing Systems