Reinforcement Learning for Intra- &amp; Inter-Node Recommender Data Pipeline Optimization

Kabir Nagrecha,Lingyi Liu,Pablo Delgado

doi:10.1145/3704923

Abstract

Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are now building large compute clusters reserved for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to observe the performance impacts of online ingestion and identify shortfalls in existing data pipeline optimizers. Our studies lead us to design a new solution for data pipeline optimization, InTuneX . InTuneX is designed for production-scale multi-node recommender data pipelines. It unifies & tackles the challenges of both intra- and inter- node pipeline optimization. We achieve this with a multi-agent reinforcement learning (RL) design, simultaneously optimizing node assignments at the cluster level & CPU assignments within nodes. Our experiments show that InTuneX can build optimized data pipeline configurations within minutes. We apply InTuneX to our cluster, and find that it increases single-node data ingestion throughput by as much as 2.29X versus state-of-the-art optimizers, while improving the cost-efficiency of multi-node pipelines by 15-25%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reinforcement Learning for Intra- & Inter-Node Recommender Data Pipeline Optimization

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Recommender Systems

Lead the way for us

Similar Papers

Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning
Tanmoy Sen ... Haiying Shen
-
Tanmoy Sen, et. al.Tanmoy Sen ... Haiying Shen
01 Jul 2022
01 Jul 2022

Review of the progress of communication-based multi-agent reinforcement learning
涵王 ... 扬俞
SCIENTIA SINICA Informationis | VOL. 52
涵王, et. al.涵王 ... 扬俞
01 May 2022
SCIENTIA SINICA Informationis | VOL. 52

Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems
Joshua Riley ... Radu Calinescu
-
Joshua Riley, et. al.Joshua Riley ... Radu Calinescu
01 Jan 2021
01 Jan 2021

Deep Reinforcement Learning
Mrinal Paliwal
-
Mrinal PaliwalMrinal Paliwal
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement Learning for Intra- &amp; Inter-Node Recommender Data Pipeline Optimization

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Recommender Systems

Reinforcement Learning for Intra- & Inter-Node Recommender Data Pipeline Optimization