Data Life Aware Model Updating Strategy for Stream-based Online Deep Learning

Wei Rang,Dazhao Cheng,Wei Chen,Donglin Yang,Kun Suo

doi:10.1109/cluster49012.2020.00049

Abstract

Many deep learning applications deployed in dynamic environments change over time, in which the training models are supposed to be continuously updated with streaming data in order to guarantee better descriptions on data trends. However, most of the state-of-the-art learning frameworks support well in offline training methods while omitting online model updating strategies. In this work, we propose and implement iDlaLayer, a thin middleware layer on top of existing training frameworks that streamlines the support and implementation of online deep learning applications. In pursuit of good model quality as well as fast data incorporation, we design a Data Life Aware model updating strategy (DLA), which builds training data samples according to contributions of data from different life stages, and considers the training cost consumed in model updating. We evaluate iDlaLayer's performance through both simulations and experiments based on TensorflowOnSpark with three representative online learning workloads. Our experimental results demonstrate that iDlaLayer reduces the overall elapsed time of MNIST, Criteo and PageRank by 11.3%, 28.2% and 15.2% compared to the periodic update strategy, respectively. It further achieves an average 20% decrease in training cost and brings about 5 % improvement in model quality against the traditional continuous training method.

Full Text